Sentiment Analysis on Social Media Text. Siddhartha Banerjee (sub253) Eric Obeysekare (ero5004) IST 557: Data Mining Project

Similar documents
Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Assignment 1: Predicting Amazon Review Ratings

Using Hashtags to Capture Fine Emotion Categories from Tweets

A Comparison of Two Text Representations for Sentiment Analysis

Postprint.

Probabilistic Latent Semantic Analysis

Multilingual Sentiment and Subjectivity Analysis

CS 446: Machine Learning

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Linking Task: Identifying authors and book titles in verbose queries

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

A Case Study: News Classification Based on Term Frequency

Movie Review Mining and Summarization

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

Robust Sense-Based Sentiment Classification

A Vector Space Approach for Aspect-Based Sentiment Analysis

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Extracting and Ranking Product Features in Opinion Documents

AQUA: An Ontology-Driven Question Answering System

Rule Learning With Negation: Issues Regarding Effectiveness

Speech Emotion Recognition Using Support Vector Machine

Extracting Verb Expressions Implying Negative Opinions

FEEL: a French Expanded Emotion Lexicon

Truth Inference in Crowdsourcing: Is the Problem Solved?

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis

A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

CPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities

Word Segmentation of Off-line Handwritten Documents

Python Machine Learning

CPS122 Lecture: Identifying Responsibilities; CRC Cards. 1. To show how to use CRC cards to identify objects and find responsibilities

Using dialogue context to improve parsing performance in dialogue systems

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Universidade do Minho Escola de Engenharia

Detecting Online Harassment in Social Networks

What is this place? Inferring place categories through user patterns identification in geo-tagged tweets

Semantic and Context-aware Linguistic Model for Bias Detection

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Determining the Semantic Orientation of Terms through Gloss Classification

Reducing Features to Improve Bug Prediction

Switchboard Language Model Improvement with Conversational Data from Gigaword

Modeling user preferences and norms in context-aware systems

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

10.2. Behavior models

Software Maintenance

Cross Language Information Retrieval

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

The Enterprise Knowledge Portal: The Concept

CS Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Verbal Behaviors and Persuasiveness in Online Multimedia Content

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Efficient Online Summarization of Microblogging Streams

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS

Subjective Analysis of Text: Sentiment Analysis Opinion Analysis (using some material from Dan Jurafsky)

Laboratorio di Intelligenza Artificiale e Robotica

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Rule Learning with Negation: Issues Regarding Effectiveness

COMMUNICATION STRATEGY FOR THE IMPLEMENTATION OF THE SYSTEM OF ENVIRONMENTAL ECONOMIC ACCOUNTING. Version: 14 November 2017

Leveraging Sentiment to Compute Word Similarity

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Term Weighting based on Document Revision History

Automating the E-learning Personalization

Learning From the Past with Experiment Databases

Mining Association Rules in Student s Assessment Data

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Word Sense Disambiguation

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Extracting Aspects, Sentiment

Introduction to Causal Inference. Problem Set 1. Required Problems

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Cross-Lingual Text Categorization

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

A Bayesian Learning Approach to Concept-Based Document Classification

Association Between Categorical Variables

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Big Fish. Big Fish The Book. Big Fish. The Shooting Script. The Movie

TextGraphs: Graph-based algorithms for Natural Language Processing

Human Emotion Recognition From Speech

Laboratorio di Intelligenza Artificiale e Robotica

Polarity Classification of Tourism Reviews in Spanish

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

16.1 Lesson: Putting it into practice - isikhnas

arxiv: v1 [cs.cl] 2 Apr 2017

Emotions from text: machine learning for text-based emotion prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge

SSIS SEL Edition Overview Fall 2017

Experts Retrieval with Multiword-Enhanced Author Topic Model

Transcription:

Sentiment Analysis on Social Media Text Siddhartha Banerjee (sub253) Eric Obeysekare (ero5004) IST 557: Data Mining Project

Agenda What is sentiment analysis? Basic concepts Literature overview Ø General approaches Ø Approaches on Social media datasets Summary & Discussions

Sentiment Analysis Identify and extract subjectivity in text Also known as Opinion Mining Textual information Subjectivity: I bought an iphone a few days ago. It is such a nice phone. Opinions Facts We only concentrate on Opinions, but. My iphone broke in just two days.

Sentiment Analysis What are the classes? Generally - positive, negative and neutral However, this is not always the case It might make sense to also understand how positive or how negative My iphone broke in just two days. partly fact, partly negative Why not have a scale (1-5)? Multi-class classification. Pang and Lee (2005) exploited knowledge from star ratings on websites How to create the dataset? Inter-annotator agreement.

Sentiment Analysis on Social Media Mostly, sentiment analysis has been applied on customer reviews on products/ movies, etc. (Turney, 2002) Limited work has been done on social media text Tweets are just 140 characters, yet users express opinions using such tweets (Kouloumpis et al., 2011) opinions on movies, elections, other sensitive issues, etc. It might be helpful to do real-time assessment of the sentiments of users on specific topics

Literature overview Several types of methods implemented on review datasets Dictionary-based approaches Supervised-learning based approaches Aspect extraction based approaches q Approaches specifically on Social Media text

Dictionary-based approaches Paper: Lexicon-based methods for sentiment analysis (Taboada et al, 2011) The performances were all really fantastic. Find polarities of individual words, if they are in a lexicon Lexicons can be manually compiled, or generated automatically (Pang and Park, 2005) Dictionary based approaches work well, but... The issue of domain specificity positive words in one domain might be negative in another The hotel room is really huge. The USB stick is really huge. Build domain specific lexicons!

Supervised Learning based approaches Sentiment analysis is a text classification problem Generate a training and test set How to generate feature vectors from text data? Example: John likes to watch movies. Mary likes movies too. Create a vocabulary: {"John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10 } Represent the sentence using elements from the vocabulary: Ø [1, 2, 1, 1, 2, 0, 0, 0, 1, 1] The values are encoded by frequencies. You can use binary too. Ø Single words are unigrams Ø Two words together are bigrams.. (John likes, likes to, to watch,. )

Classification problem Thumbs up? Sentiment Classification using Machine Learning Techniques (Pang et al, 2002) Movie review dataset: 1400 (balanced positive and negative) Reasonably good performance on this dataset with simple set of features!

Aspect extraction Fine-grained sentiment analysis Not just overall sentiment, sentiment focused on aspects I bought an iphone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear too. A frequency-based approach (Hu and Liu, 2004): nouns (NN) that are frequently talked about are likely to be true aspects (called frequent aspects). Find the adjectives that modify such nouns Nearest adjective rule.

Social Media

Predicting Political Opinion Using Twitter to predict election results (Tumasjan et al., 2010) Counting tweets Sentiment analysis - LIWC

An LIWC Example LIWC Dimension Self-references (I, me, my) Jessie s emails Jessie s site Personal Texts Formal Texts 4.53 2.87 11.4 4.2 Social words 14.34 3.69 9.5 8.0 Positive emotions 1.89 1.91 2.7 2.6 Negative emotions 0.00 0.41 2.6 1.6 Overall cognitive words 6.42 6.01 7.8 5.4 Articles (a, an, the) 7.55 5.60 5.0 7.2 Big words (> 6 letters) 22.26 37.98 13.1 19.6

Discussion What are some issues with the use of LIWC in the analysis of Twitter data? How could LIWC be used to analyze tweets?

Predicting Political Opinion (cont.) Comparing Twitter sentiment to political polls (O Connor et al., 2010) Sentiment analysis OpinionFinder Daily ratio of positive to negative words Message retrieval keywords, hashtags

Tweet selection Twitter APIs Search max 3,200 tweets, 180 searches/15 minutes Streaming real-time, returns a subset Firehose real-time, premium service Automatic Topic-focused Monitor (ATM) (Li et al, 2013) Sample tweets for some keywords Select most relevant keywords from that sample Query Twitter with new keywords Iterate

Predicting Political Opinion (cont.) Real-time debate performance (Diakopoulos & Shamma, 2010) Sentiment determined manually Mechanical Turk

Predicting Real-world Outcomes with Twitter Box Office Sales (Asur & Huberman, 2010) Tweet rates Sentiment Analysis supervised learning model Training data - Mechanical Turk LingPipe Computational linguistics package DynamicLMClassifier

Predicting Real-world Outcomes with Twitter (cont.) Subjectivity measure Multiple samples

Predicting Real-world Outcomes with Twitter (cont.) Positive Negative ratio Multiple samples

Discussion What is the best method to use for sentiment analysis of Tweets? Dictionary based? Supervised learning model? Something else? How can we select tweets?

Conclusion Twitter is a good approximation of real-world opinions Multiple approaches with different benefits and drawbacks Dictionary-based plug and play Supervised learning more customizable Tweet selection is hard!

Questions?

References Turney, Peter D. "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews." Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002. Kouloumpis, Efthymios, Theresa Wilson, and Johanna Moore. "Twitter sentiment analysis: The good the bad and the omg!." ICWSM 11 (2011): 538-541. Tumasjan, A., Sprenger, T., Sandner, P., & Welpe, I. (2010). Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 178 185. Retrieved from http://www.aaai.org/ocs/index.php/icwsm/icwsm10/paper/viewfile/1441/1852 Diakopoulos, N. a., & Shamma, D. a. (2010). Characterizing debate performance via aggregated twitter sentiment. Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI 10, 1195 1198. doi:10.1145/1753326.1753504

References O Connor, B., & Balasubramanyan, R. (2010). From tweets to polls: Linking text sentiment to public opinion time series. ICWSM, 11, 122 129. Retrieved from http://www.aaai.org/ocs/index.php/icwsm/icwsm10/paper/viewpdfinterstitial/ 1536/1842 Asur, S., & Huberman, B. (2010). Predicting the future with social media. Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, 1, 492 499. Retrieved from http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5616710 Li, R., Wang, S., & Chang, K. C. C. (2013). Towards social data platform: Automatic topicfocused monitor for twitter stream. Proceedings of the VLDB Endowment, 6(14), 1966-1977.