SENTIMENT CLASSIFICATION OF MOVIE REVIEWS USING LINGUISTIC PARSING Brian Eriksson bceriksson@wisc.edu CS 838 - Natural Language Processing Final Project Report ABSTRACT The problem of sentiment analysis requires a deeper understanding of the English language than previously established techniques in the field obtain. The Linguistic Tree Transformation Algorithm is introduced as a method to exploit the syntactical dependencies between words in a sentence and to disambiguate word senses. The algorithm is tested against the established Pang/Lee dataset and a new list of Roger Ebert reviews. A new method of objective sentence removal is also introduced to improve established methods of sentiment analysis against full reviews with no user extraction of objective sentences. 1. INTRODUCTION The last few years has seen an explosion in the number of papers under the topic of sentiment analysis. This is a fundamental shift in the area of Natural Language Processing. Previously, the underlying problem had been one of topic classification, where one is concerned only about what is being communicated. With sentiment analysis, a deeper understanding of the document must be extracted. This shifts the concern from what is being communicated to how is it being communicated. Previous papers written on how to solve this problem ([1-3]) ignore the fundamental richness of the English language which is used to communicate sentiment, and instead focus on the use of previous methods (N-gram, etc.) which throw many of these useful feature away. New methods using the power of linguistic techniques to exploit English must be found to improve sentiment classification rates. Any new methods must focus on two large problems in the area of sentiment analysis, the non-local dependencies problem and the word sense disambiguation problem. 2. PREVIOUS WORK The cornerstone for work on sentiment analysis is Pang and Lee s 2002 paper [1]. The authors of that paper compare Naive Bayes, Maximum Entropy, and Support Vector Machine approaches to classifying sentiment of movie reviews. They explain the relatively poor performance of the methods (versus a standard topic classification problem) as a result of sentiment analysis requiring a deeper understanding of the document under analysis. In 2005 ([3]), they returned to the topic and examined the multi-class performance under the finer scale star rating dataset. They added a nearest neighbor classifier to their collection of approaches, but the results still show great room for improvement. A better approach is taken by Matsumoto, et al. in [2]. The authors of that paper recognize that word order and syntactic relations between words are extremely important in the area of sentiment classification, and therefore it is imperative that they are not discarded. The approach they purpose involves taking each sentence of a review and constructing a dependency tree. This dependency tree would then be pruned to create subtree for classification. This subtree would graph the connection between words, while retaining their syntactical relationships and order in the original sentence. One drawback to this approach is that a great number of these subtrees are produced in the training stage of the algorithm, and for performance purposes they discard all but the N most frequent number of these subtrees. Outside the area of sentiment analysis, focusing instead on classification of documents using linguistic parsing, is Michael Collins in [4]. Collins develops a distance metric for extracting dependency bigrams from linguistic tree structures. The classification rates are quite good, but the algorithm runs fairly slow. A more simplistic tree parsing algorithm may result in similar rates of classification, while making the algorithm processing time practical. 3. NON-LOCAL DEPENDENCIES PROBLEM One of the fundamental problems in extracting meaning from a sentence is the non-local dependency problem. Often times, two words that are syntactically linked in a sentence are separated by several words. In these cases, small N valued N-gram models would fail at extracting a correlation between the two
words. A new method must be devised to find pairs of words that are syntactically linked. An clear example of this problem is found in the sentence (from [6]): In a movie this bad, one plot element is really idiotic. To anyone reading the preceding sentence, it should be obvious that two sentimental ideas are trying to be communicated to the reader. The first is that the author is trying to indicate that the movie was bad, the second is the communication that the plot element was idiotic. Using standard N-gram approaches, a trigram model would be necessary to map the dependencies of (movie,bad), and a 5-gram model would be necessary for (plot, idiotic). A powerful classifier for sentiment analysis would extract the non-local bigrams of (movie,bad) and (plot, idiotic), while collecting a minimum number of other sentiment lacking bigrams. The fundamental difference between these two sentences can be clearly seen if one goes to the linguistic level. After parsing both sentences into their standard Chomskyean grammar form, the trees are seen in figures 1 and 2. Figure 1. Positive Sentence Parse 4. WORD SENSE DISAMBIGUATION PROBLEM The word sense disambiguation problem is mentioned by the authors of [1] as a fundamental problem in the area of sentiment analysis. As an example they used two sentences: Sentence 1: I love this story. Sentence 2: This is a love story. It should be obvious to the reader that the first sentence is communicating positive sentiment, and the second sentence is an objective statement with neutral sentiment. The problem of using standard Natural Language Processing techniques is apparent when using an unigram model on both sentences. Unigram Model: p (S) = p (w 1 ) p (w 2 )...p (w N ) p (I love this story) = p (I) p (love) p (this) p (story) p (This is a love story) = p (This) p (is) p (a) p (love) p (story) Both sentence have three words commonly occur, therefore the probability model can be modified as: p i = p (this) p (love) p (story) p (This is a love story) = p i p (is) p (a) p (I love this story) = p i p (I) The resulting probability model difference between the two sentiment differing sentences is very small. Figure 2. Netural Sentence Parse The sentences are decomposed into forms of Noun Phrases (NP), Verb Phrases (VP) and then singular words labels (NN - noun, VBZ - verb, etc.). The most important feature to notice is the label on the common word love. Notice that on the positive sentiment sentence, the word is used as a verb, as in the word is used to indicate an action that the author of the sentence performed. On the other hand, in the neutral sentence, the word love is used as a noun, simply to indicate a thing (in this case, the story type) and therefore has no sentiment attached. A powerful classifier for sentiment analysis would take into consideration the label of each word. 5. LINGUISTIC TREE TRANSFORMATION ALGORITHM With the word disambiguation and non-local dependencies problem at the forefront, a tree-pruning algorithm was developed. From [4], it was obvious that a linguistic tree algorithm must be reasonably simple for computation time purposes. The work in [2] was a good start in determining how the trees
should be pruned, but changes must be made to create a less sparse training feature set. After empirical analysis of many example movie reviews (from [6]) and examining the corresponding tree structure of their sentences, it was determined that one of the first steps should be to remove all leafs not labeled as a noun, verb, or adjective. It was then observed that most movie review documents are filled with fairly verbose sentences, and the while retaining the overall tree structure is important, there must be some flatten mechanism to ensure that large complex tree structures are simplified. With knowledge of these two actions needed (pruning and flattening), the following algorithm was devised. Figure 5. Linguistic Tree Transformation - Step 3 The Linguistic Tree Transform Algorithm: 1. Parse the sentence into the standard Chomskyean tree structure Figure 6. Linguistic Tree Transformation - Step 5 6. Create a list of all the single noun, verbs, and adjectives 7. Create a list of all pairs of noun-verbs, noun-adjectives, verb-adjectives at the same tree depth Figure 3. Linguistic Tree Transformation - Step 1 2. Pruning - Eliminate all leafs not labeled as noun, verb, or adjective Nouns Verbs Adjectives Noun-Verb Pairs Noun-Adjective Pairs Verb-Adjective Pairs movie, plot, element is bad, idiotic (Movie,is) (plot,is) (element,is) (movie,bad) (element,bad) (plot,bad) (movie,idiotic) (plot,idiotic) (element, idiotic) (is,idiotic) (is,bad) Table 1. List extracted from the Linguistic Tree Transformation Algorithm using Figure 6. Figure 4. Linguistic Tree Transformation - Step 2 3. Pruning - Set phrase node labels as NULL 4. Flattening - For each leaf, collapse with its label value. Concatenate the label value to the leaf node. 5. Flattening - For each node with only one leaf node, eliminate the node and raise the leaf node up one depth level in the tree As seen in Table 1, the two important bigram elements ({plot, idiotic} and {movie, bad}) have been extracted from the sentence. Also received are the classification labels (noun, verb, adjective) of the critical words of the sentence. This is an example of how the Linguistic Tree Transformation algorithm is a powerful tool for solving both the word sense disambiguation problem and the non-local dependencies problem. 6. OBJECTIVE SENTENCE REMOVAL ALGORITHM Previous work on sentiment analysis ([1-3]) focused on using datasets that contained movie reviews with objective sentences already removed. For sentiment analysis to become completely automated, an algorithm must be developed that
takes into account objective sentences. For movie reviews, a vast majority of the document usually consists of an explanation of the plot. For a classifier, this would corrupt the results greatly, as two movies of differing quality but with the same plot would generally get the same classification of sentiment. After analysis of several full movie reviews (from [6]), it was determined that a simplified approach to removing objective sentences can be taken that would result in a collection of satisfactory subjective sentences. The Objective Sentence Removal Algorithm: 1. Assume that the algorithm has prior knowledge of the movie title, the director s name, and the screenwriter s name 2. Examine each sentence in the document. For each occurrence of the the movie title, replace with MOVIE. For each occurrence of the director s name, replace string with DIRECTOR. For each occurrence of the screenwriter s name, replace the string with SCREENWRITER. 3. Examine each sentence in the document, if the sentence does not contain at least one word from List 1., eliminate the sentence from the document. 4. Create a document with the sentences that have not been eliminated. MOVIE DIRECTOR SCREENWRITER film script performance plot List 1. Word List for Objective Sentence Removal For comparison purposes, The Objective Sentence Removal algorithm was performed on the Roger Ebert review of the film The China Syndrome ([6]). The China Syndrome is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the movie is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, James Bridges, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in The China Syndrome are indeed based on actual occurrences at nuclear plants. Even the most unlikely mishap (a stuck needle on a graph causing engineers to misread a crucial water level) really happened at the Dresden plant outside Chicago. The key character is Godell (Jack Lemmon), a shift supervisor at a big nuclear power plant in Southern California. He lives alone, quietly, and can say without any self-consciousness that the plant is his life. He believes in nuclear power. Text 1: Original Paragraph Looking at the excerpt of the movie review, one can see several sentences related to the plot with no sentiment about the movie included. Analyzing a sentence, such as He believes in nuclear power., would result in no added knowledge about the reviewer s thoughts on the film, and therefore would become noise to a sentiment classifier. MOVIE is a terrific thriller that incidentally raises the most unsettling questions about how safe nuclear power plants really are. But the MOVIE is, above all, entertainment: well-acted, well-crafted, scary as hell. The director, DIRECTOR, uses an exquisite sense of timing and character development to bring us to the cliffhanger conclusion. The events leading up to the accident in MOVIE are indeed based on actual occurrences at nuclear plants. Text 2: Paragraph after Algorithm As seen in the text above, the Objective Sentence Removal algorithm removes almost every objective sentence in the document, the exception being the last sentence which is kept due to it containing the name of the film. The algorithm retains all the subjective sentences containing sentiment about the film. 7. RESULTS The lists outputted from the Linguisitic Tree Transformation Algorithm were arranged into frequency SVM model form (for use with the SVM-light software package [7]). Performance was tested against a frequency unigram SVM model
and a frequency bigram SVM model (again, using [7]). The first dataset tested was the subjective sentence only Sentence Polarity Dataset v1.0 originally created by Pang and Lee. The dataset contains 5331 positive and 5331 negative processed sentences with all objective sentences removed by user interaction. From this database, the first 4000 sentences were used to form a training set, and the remaining 1331 sentences were used to test accuracy performance. Unigram SVM 75.11% Bigram SVM 71.04% Linguistic Tree Transform SVM 84.09% Table 2. Pang-Lee Accuracy Results The second dataset used was 120 complete reviews (30 zero star, 30 one star, 30 three star, 30 four star reviews) taken from [6]. All 120 reviews were taken by adding the header in figure 8, followed by the original review with absolutely no modification. Because of the use of full reviews, the documents contained many objective sentences (plot description, etc.). The performance of the algorithms was tested both with the Objective Sentence Removal Algorithm and without. Both tests use 60 documents (30 positive (three and four star) and 30 negative (one and zero star) reviews) to train the classifier, and then tests the accuracy on classifying the remaining 60 documents (30 positive and 30 negative reviews). -film titledirector name- -screenwriter name- Figure 7. - Ebert Review Header Unigram SVM 65.00% Bigram SVM 63.33% Linguistic Tree Transform SVM 100.00% Table 3. Ebert (w/o Objective Removal) Accuracy Results Unigram SVM 83.33% Bigram SVM 65.00% Linguistic Tree Transform SVM 100.00% Table 4. Ebert (w/objective Removal) Accuracy Results 8. FUTURE DIRECTIONS The development of this algorithm has left many openings for future improvements. It was hypothesized that the inclusion of synonyms would improve accuracy rates. The algorithm code currently includes functionality for using synonyms from the WordNet ([8]) software package. When implemented, the WordNet-modified accuracy rate was actually lower than without using the synonym data. After analyzing the synonyms received by the WordNet software package, it was determined that the software returned a large number of synonyms that most people would not use in regular conversation ( good returns the word goodness ) thus adding noise into the classification system. Because of this problem, the WordNet functionality was turned off in the algorithm. Future work could possibly modify WordNet data into a form useful for sentiment classification. Other directions possible are accounting for the use of sarcasm, taking antonyms when not appears before an adjective, and extending the algorithm to classify individual star ratings. 9. CONCLUSIONS The results show the power of the two algorithms introduced in this paper. The Linguistic Tree Transformation algorithm consistently performs better than the established N-gram methods, with a slightly less than nine percent classification accuracy improvement when using the Sentence Polarity Database 1.0. When using the Roger Ebert dataset, the Linguistic Tree Transformation algorithm performs perfect classification both with and without objective sentence removal. The need for objective sentence removal and the strength of the Objective Sentence Removal Algorithm can be seen by the improvement of the classification using N-gram methods (Unigram - 65% to 83.33%, Bigram - 63.33% to 65%). 10. REFERENCES [1] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002. [2] S. Matsumoto, H. Takamura, M. Okumura, Sentiment Classification using Word Sub-Sequences and Dependency Sub-Tree, Proceedings of PAKDD, 2005. [3] B. Pang, L. Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of the ACL, 2005. [4] M. Collins, A new statistical parser based on bigram lexical dependencies, Proceedings of the 34th annual meeting on Association for Computational Linguistics, 1996.
[5] D. Prescher, A Tutorial on the Expectation- Maximization Algorithm Including Maximum- Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars, The 15th European Summer School in Logic, Language and Information, 2003. [6] R. Ebert, Roger Ebert Reviews www.rogerebert.com, 2006. [7] T. Joachims Making large-scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, 1999. [8] C. Fellbaum Wordnet: An Electronic Lexical Database, MIT Press, 1998.