SENTIMENT CLASSIFICATION OF MOVIE REVIEWS USING LINGUISTIC PARSING. Brian Eriksson.

Similar documents
A Comparison of Two Text Representations for Sentiment Analysis

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Linking Task: Identifying authors and book titles in verbose queries

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Case Study: News Classification Based on Term Frequency

Using dialogue context to improve parsing performance in dialogue systems

Rule Learning With Negation: Issues Regarding Effectiveness

Robust Sense-Based Sentiment Classification

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Python Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

The stages of event extraction

AQUA: An Ontology-Driven Question Answering System

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

CS 446: Machine Learning

Prediction of Maximal Projection for Semantic Role Labeling

Probabilistic Latent Semantic Analysis

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Bayesian Learning Approach to Concept-Based Document Classification

Learning From the Past with Experiment Databases

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Parsing of part-of-speech tagged Assamese Texts

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Multilingual Sentiment and Subjectivity Analysis

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Assignment 1: Predicting Amazon Review Ratings

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

arxiv: v1 [cs.cl] 2 Apr 2017

A Vector Space Approach for Aspect-Based Sentiment Analysis

Leveraging Sentiment to Compute Word Similarity

Lecture 1: Machine Learning Basics

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The College Board Redesigned SAT Grade 12

Speech Recognition at ICSI: Broadcast News and beyond

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Extracting Verb Expressions Implying Negative Opinions

Verbal Behaviors and Persuasiveness in Online Multimedia Content

Natural Language Processing. George Konidaris

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Guidelines for Writing an Internship Report

Writing a composition

Movie Review Mining and Summarization

Universiteit Leiden ICT in Business

Speech Emotion Recognition Using Support Vector Machine

Proof Theory for Syntacticians

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

CS Machine Learning

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Ensemble Technique Utilization for Indonesian Dependency Parser

Large vocabulary off-line handwriting recognition: A survey

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

An Interactive Intelligent Language Tutor Over The Internet

Human Emotion Recognition From Speech

Some Principles of Automated Natural Language Information Extraction

Vocabulary Usage and Intelligibility in Learner Language

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Loughton School s curriculum evening. 28 th February 2017

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Australian Journal of Basic and Applied Sciences

A Domain Ontology Development Environment Using a MRD and Text Corpus

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Accuracy (%) # features

What the National Curriculum requires in reading at Y5 and Y6

Determining the Semantic Orientation of Terms through Gloss Classification

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Beyond the Pipeline: Discrete Optimization in NLP

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

CEFR Overall Illustrative English Proficiency Scales

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Multi-Lingual Text Leveling

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Indian Institute of Technology, Kanpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

LTAG-spinal and the Treebank

Online Updating of Word Representations for Part-of-Speech Tagging

Context Free Grammars. Many slides from Michael Collins

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Distant Supervised Relation Extraction with Wikipedia and Freebase

The Smart/Empire TIPSTER IR System

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Word Segmentation of Off-line Handwritten Documents

Mining Topic-level Opinion Influence in Microblog

Using Semantic Relations to Refine Coreference Decisions

A Graph Based Authorship Identification Approach

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Why Pay Attention to Race?

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Short Text Understanding Through Lexical-Semantic Analysis

Transcription:

SENTIMENT CLASSIFICATION OF MOVIE REVIEWS USING LINGUISTIC PARSING Brian Eriksson bceriksson@wisc.edu CS 838 - Natural Language Processing Final Project Report ABSTRACT The problem of sentiment analysis requires a deeper understanding of the English language than previously established techniques in the field obtain. The Linguistic Tree Transformation Algorithm is introduced as a method to exploit the syntactical dependencies between words in a sentence and to disambiguate word senses. The algorithm is tested against the established Pang/Lee dataset and a new list of Roger Ebert reviews. A new method of objective sentence removal is also introduced to improve established methods of sentiment analysis against full reviews with no user extraction of objective sentences. 1. INTRODUCTION The last few years has seen an explosion in the number of papers under the topic of sentiment analysis. This is a fundamental shift in the area of Natural Language Processing. Previously, the underlying problem had been one of topic classification, where one is concerned only about what is being communicated. With sentiment analysis, a deeper understanding of the document must be extracted. This shifts the concern from what is being communicated to how is it being communicated. Previous papers written on how to solve this problem ([1-3]) ignore the fundamental richness of the English language which is used to communicate sentiment, and instead focus on the use of previous methods (N-gram, etc.) which throw many of these useful feature away. New methods using the power of linguistic techniques to exploit English must be found to improve sentiment classification rates. Any new methods must focus on two large problems in the area of sentiment analysis, the non-local dependencies problem and the word sense disambiguation problem. 2. PREVIOUS WORK The cornerstone for work on sentiment analysis is Pang and Lee s 2002 paper [1]. The authors of that paper compare Naive Bayes, Maximum Entropy, and Support Vector Machine approaches to classifying sentiment of movie reviews. They explain the relatively poor performance of the methods (versus a standard topic classification problem) as a result of sentiment analysis requiring a deeper understanding of the document under analysis. In 2005 ([3]), they returned to the topic and examined the multi-class performance under the finer scale star rating dataset. They added a nearest neighbor classifier to their collection of approaches, but the results still show great room for improvement. A better approach is taken by Matsumoto, et al. in [2]. The authors of that paper recognize that word order and syntactic relations between words are extremely important in the area of sentiment classification, and therefore it is imperative that they are not discarded. The approach they purpose involves taking each sentence of a review and constructing a dependency tree. This dependency tree would then be pruned to create subtree for classification. This subtree would graph the connection between words, while retaining their syntactical relationships and order in the original sentence. One drawback to this approach is that a great number of these subtrees are produced in the training stage of the algorithm, and for performance purposes they discard all but the N most frequent number of these subtrees. Outside the area of sentiment analysis, focusing instead on classification of documents using linguistic parsing, is Michael Collins in [4]. Collins develops a distance metric for extracting dependency bigrams from linguistic tree structures. The classification rates are quite good, but the algorithm runs fairly slow. A more simplistic tree parsing algorithm may result in similar rates of classification, while making the algorithm processing time practical. 3. NON-LOCAL DEPENDENCIES PROBLEM One of the fundamental problems in extracting meaning from a sentence is the non-local dependency problem. Often times, two words that are syntactically linked in a sentence are separated by several words. In these cases, small N valued N-gram models would fail at extracting a correlation between the two

words. A new method must be devised to find pairs of words that are syntactically linked. An clear example of this problem is found in the sentence (from [6]): In a movie this bad, one plot element is really idiotic. To anyone reading the preceding sentence, it should be obvious that two sentimental ideas are trying to be communicated to the reader. The first is that the author is trying to indicate that the movie was bad, the second is the communication that the plot element was idiotic. Using standard N-gram approaches, a trigram model would be necessary to map the dependencies of (movie,bad), and a 5-gram model would be necessary for (plot, idiotic). A powerful classifier for sentiment analysis would extract the non-local bigrams of (movie,bad) and (plot, idiotic), while collecting a minimum number of other sentiment lacking bigrams. The fundamental difference between these two sentences can be clearly seen if one goes to the linguistic level. After parsing both sentences into their standard Chomskyean grammar form, the trees are seen in figures 1 and 2. Figure 1. Positive Sentence Parse 4. WORD SENSE DISAMBIGUATION PROBLEM The word sense disambiguation problem is mentioned by the authors of [1] as a fundamental problem in the area of sentiment analysis. As an example they used two sentences: Sentence 1: I love this story. Sentence 2: This is a love story. It should be obvious to the reader that the first sentence is communicating positive sentiment, and the second sentence is an objective statement with neutral sentiment. The problem of using standard Natural Language Processing techniques is apparent when using an unigram model on both sentences. Unigram Model: p (S) = p (w 1 ) p (w 2 )...p (w N ) p (I love this story) = p (I) p (love) p (this) p (story) p (This is a love story) = p (This) p (is) p (a) p (love) p (story) Both sentence have three words commonly occur, therefore the probability model can be modified as: p i = p (this) p (love) p (story) p (This is a love story) = p i p (is) p (a) p (I love this story) = p i p (I) The resulting probability model difference between the two sentiment differing sentences is very small. Figure 2. Netural Sentence Parse The sentences are decomposed into forms of Noun Phrases (NP), Verb Phrases (VP) and then singular words labels (NN - noun, VBZ - verb, etc.). The most important feature to notice is the label on the common word love. Notice that on the positive sentiment sentence, the word is used as a verb, as in the word is used to indicate an action that the author of the sentence performed. On the other hand, in the neutral sentence, the word love is used as a noun, simply to indicate a thing (in this case, the story type) and therefore has no sentiment attached. A powerful classifier for sentiment analysis would take into consideration the label of each word. 5. LINGUISTIC TREE TRANSFORMATION ALGORITHM With the word disambiguation and non-local dependencies problem at the forefront, a tree-pruning algorithm was developed. From [4], it was obvious that a linguistic tree algorithm must be reasonably simple for computation time purposes. The work in [2] was a good start in determining how the trees

should be pruned, but changes must be made to create a less sparse training feature set. After empirical analysis of many example movie reviews (from [6]) and examining the corresponding tree structure of their sentences, it was determined that one of the first steps should be to remove all leafs not labeled as a noun, verb, or adjective. It was then observed that most movie review documents are filled with fairly verbose sentences, and the while retaining the overall tree structure is important, there must be some flatten mechanism to ensure that large complex tree structures are simplified. With knowledge of these two actions needed (pruning and flattening), the following algorithm was devised. Figure 5. Linguistic Tree Transformation - Step 3 The Linguistic Tree Transform Algorithm: 1. Parse the sentence into the standard Chomskyean tree structure Figure 6. Linguistic Tree Transformation - Step 5 6. Create a list of all the single noun, verbs, and adjectives 7. Create a list of all pairs of noun-verbs, noun-adjectives, verb-adjectives at the same tree depth Figure 3. Linguistic Tree Transformation - Step 1 2. Pruning - Eliminate all leafs not labeled as noun, verb, or adjective Nouns Verbs Adjectives Noun-Verb Pairs Noun-Adjective Pairs Verb-Adjective Pairs movie, plot, element is bad, idiotic (Movie,is) (plot,is) (element,is) (movie,bad) (element,bad) (plot,bad) (movie,idiotic) (plot,idiotic) (element, idiotic) (is,idiotic) (is,bad) Table 1. List extracted from the Linguistic Tree Transformation Algorithm using Figure 6. Figure 4. Linguistic Tree Transformation - Step 2 3. Pruning - Set phrase node labels as NULL 4. Flattening - For each leaf, collapse with its label value. Concatenate the label value to the leaf node. 5. Flattening - For each node with only one leaf node, eliminate the node and raise the leaf node up one depth level in the tree As seen in Table 1, the two important bigram elements ({plot, idiotic} and {movie, bad}) have been extracted from the sentence. Also received are the classification labels (noun, verb, adjective) of the critical words of the sentence. This is an example of how the Linguistic Tree Transformation algorithm is a powerful tool for solving both the word sense disambiguation problem and the non-local dependencies problem. 6. OBJECTIVE SENTENCE REMOVAL ALGORITHM Previous work on sentiment analysis ([1-3]) focused on using datasets that contained movie reviews with objective sentences already removed. For sentiment analysis to become completely automated, an algorithm must be developed that

takes into account objective sentences. For movie reviews, a vast majority of the document usually consists of an explanation of the plot. For a classifier, this would corrupt the results greatly, as two movies of differing quality but with the same plot would generally get the same classification of sentiment. After analysis of several full movie reviews (from [6]), it was determined that a simplified approach to removing objective sentences can be taken that would result in a collection of satisfactory subjective sentences. The Objective Sentence Removal Algorithm: 1. Assume that the algorithm has prior knowledge of the movie title, the director s name, and the screenwriter s name 2. Examine each sentence in the document. For each occurrence of the the movie title, replace with MOVIE. For each occurrence of the director s name, replace string with DIRECTOR. For each occurrence of the screenwriter s name, replace the string with SCREENWRITER. 3. Examine each sentence in the document, if the sentence does not contain at least one word from List 1., eliminate the sentence from the document. 4. Create a document with the sentences that have not been eliminated. MOVIE DIRECTOR SCREENWRITER film script performance plot List 1. Word List for Objective Sentence Removal For comparison purposes, The Objective Sentence Removal algorithm was performed on the Roger Ebert review of the film The China Syndrome ([6]). The China Syndrome is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the movie is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, James Bridges, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in The China Syndrome are indeed based on actual occurrences at nuclear plants. Even the most unlikely mishap (a stuck needle on a graph causing engineers to misread a crucial water level) really happened at the Dresden plant outside Chicago. The key character is Godell (Jack Lemmon), a shift supervisor at a big nuclear power plant in Southern California. He lives alone, quietly, and can say without any self-consciousness that the plant is his life. He believes in nuclear power. Text 1: Original Paragraph Looking at the excerpt of the movie review, one can see several sentences related to the plot with no sentiment about the movie included. Analyzing a sentence, such as He believes in nuclear power., would result in no added knowledge about the reviewer s thoughts on the film, and therefore would become noise to a sentiment classifier. MOVIE is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the MOVIE is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, DIRECTOR, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in MOVIE are indeed based on actual occurrences at nuclear plants. Text 2: Paragraph after Algorithm As seen in the text above, the Objective Sentence Removal algorithm removes almost every objective sentence in the document, the exception being the last sentence which is kept due to it containing the name of the film. The algorithm retains all the subjective sentences containing sentiment about the film. 7. RESULTS The lists outputted from the Linguisitic Tree Transformation Algorithm were arranged into frequency SVM model form (for use with the SVM-light software package [7]). Performance was tested against a frequency unigram SVM model

and a frequency bigram SVM model (again, using [7]). The first dataset tested was the subjective sentence only Sentence Polarity Dataset v1.0 originally created by Pang and Lee. The dataset contains 5331 positive and 5331 negative processed sentences with all objective sentences removed by user interaction. From this database, the first 4000 sentences were used to form a training set, and the remaining 1331 sentences were used to test accuracy performance. Unigram SVM 75.11% Bigram SVM 71.04% Linguistic Tree Transform SVM 84.09% Table 2. Pang-Lee Accuracy Results The second dataset used was 120 complete reviews (30 zero star, 30 one star, 30 three star, 30 four star reviews) taken from [6]. All 120 reviews were taken by adding the header in figure 8, followed by the original review with absolutely no modification. Because of the use of full reviews, the documents contained many objective sentences (plot description, etc.). The performance of the algorithms was tested both with the Objective Sentence Removal Algorithm and without. Both tests use 60 documents (30 positive (three and four star) and 30 negative (one and zero star) reviews) to train the classifier, and then tests the accuracy on classifying the remaining 60 documents (30 positive and 30 negative reviews). -film titledirector name- -screenwriter name- Figure 7. - Ebert Review Header Unigram SVM 65.00% Bigram SVM 63.33% Linguistic Tree Transform SVM 100.00% Table 3. Ebert (w/o Objective Removal) Accuracy Results Unigram SVM 83.33% Bigram SVM 65.00% Linguistic Tree Transform SVM 100.00% Table 4. Ebert (w/objective Removal) Accuracy Results 8. FUTURE DIRECTIONS The development of this algorithm has left many openings for future improvements. It was hypothesized that the inclusion of synonyms would improve accuracy rates. The algorithm code currently includes functionality for using synonyms from the WordNet ([8]) software package. When implemented, the WordNet-modified accuracy rate was actually lower than without using the synonym data. After analyzing the synonyms received by the WordNet software package, it was determined that the software returned a large number of synonyms that most people would not use in regular conversation ( good returns the word goodness ) thus adding noise into the classification system. Because of this problem, the WordNet functionality was turned off in the algorithm. Future work could possibly modify WordNet data into a form useful for sentiment classification. Other directions possible are accounting for the use of sarcasm, taking antonyms when not appears before an adjective, and extending the algorithm to classify individual star ratings. 9. CONCLUSIONS The results show the power of the two algorithms introduced in this paper. The Linguistic Tree Transformation algorithm consistently performs better than the established N-gram methods, with a slightly less than nine percent classification accuracy improvement when using the Sentence Polarity Database 1.0. When using the Roger Ebert dataset, the Linguistic Tree Transformation algorithm performs perfect classification both with and without objective sentence removal. The need for objective sentence removal and the strength of the Objective Sentence Removal Algorithm can be seen by the improvement of the classification using N-gram methods (Unigram - 65% to 83.33%, Bigram - 63.33% to 65%). 10. REFERENCES [1] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002. [2] S. Matsumoto, H. Takamura, M. Okumura, Sentiment Classification using Word Sub-Sequences and Dependency Sub-Tree, Proceedings of PAKDD, 2005. [3] B. Pang, L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL, 2005. [4] M. Collins, A new statistical parser based on bigram lexical dependencies, Proceedings of the 34th annual meeting on Association for Computational Linguistics, 1996.

[5] D. Prescher, A Tutorial on the Expectation- Maximization Algorithm Including Maximum- Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars, The 15th European Summer School in Logic, Language and Information, 2003. [6] R. Ebert, Roger Ebert Reviews www.rogerebert.com, 2006. [7] T. Joachims Making large-scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, 1999. [8] C. Fellbaum Wordnet: An Electronic Lexical Database, MIT Press, 1998.