Word Vectors in Sentiment Analysis

e-issn 2455 1392 Volume 2 Issue 5, May 2016 pp. 594 598 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com Word Vectors in Sentiment Analysis Shamseera sherin P. 1, Sreekanth E. S. 2 1 PG Scholar, 2 Asst. Professor 1,2 Department of Computer Science and Engineering MES College of Engineering, Kuttippuram Kerala, 679573, India Abstract Sentiment analysis, a special task for determining the subjective attitude (i.e., sentiment) expressed by the text, is becoming a hotspot in the field of natural language processing. The basic task of sentiment analysis is to determine the class(positive vs negative) of given text. It s very important to represent the sentiment with efficient feature to improve the sentiment analysis. The Supervised feature model such as bag-of-words (BOW) model, it represent words as indices in vocabulary. The Unsupervised model such as Word2Vec, GloVe is typically used as the feature vector in natural language processing task. BOW model lacks in capturing the rich relational structure of the lexicon, on the other hand Unsupervised model fails to capture sentiment information. In our work we introduced new feature model by combining Supervised model with Unsupervised models. We evaluate performance of these features from difference approaches on different classification algorithm as Logestic Regression. Keywords Natural language processing, sentiment analysis, word vector I. INTRODUCTION Sentiment Analysis is one of the task in natural language processing. Due to the wide use of the internet, people have been able to provide various information to public. The information often includes opinions or sentiments towards some products. A huge amount of work has been introduced to analysis of the information, which is called sentiment analysis. The sentiment analysis determine whether text is positive or negative and It has been done at different levels including words, sentences, and documents. The task to classify sentences into positive or negative, because this task is fundamental and has a wide applicability in sentiment analysis. For example, retrieve a individual s opinions that are related to a product and can find whether they have the positive attitude to the product. There has been much work on the identification of sentiment polarity of words. For instance, GOOD is positively oriented, while BAD is negatively oriented. Then use predefined polarity dictionary to refer the sentiment words. Sentiment words are the resource for sentiment analysis and thus have a great potential for applications. However, it is still a challenge how effectively use sentiment words to improve performance of sentiment classification. The main task in sentiment analysis is sentiment classification. Generally, The bag-of-words (BOW) model is typically used for text representation. A review text is represented by a vector of independent words in Bag Of Word. The machine learning algorithms such as logistic regression, and support vector machines are used to train a sentiment word. The BOW model is very simple and efficient in topic-based text classification. But it is actually not suitable for sentiment classification because it breaks the syntactic structures, and disrupts the word order, discards some semantic information. A lot of researches have been done on sentiment analysis aimed to enhance BOW. However, due to the fundamental deficiencies in BOW, most of these efforts showed very small effects in improving the performance of classification accuracy. Polarity shift problem is most well- @IJCTER-2016, All rights Reserved 594

known difficulty in Bag Of Word. Polarity shift is a kind of linguistic phenomenon that can reverse the sentiment polarity of the text. Negation is the most important type of polarity shift. For example, by adding a negation word don t to a positive text I like this movie the sentiment of the text will be reversed from positive to negative and it can be considered to be very similar by the BOW representation. This is the main reason for the failure of standard machine learning algorithms under the circumstance of polarity shift. We proposes a effective model for text representation called Vector representation of words, it can be broadly divided into mainly three classes. That are one hot vector,distributional semantic vectors and distributed word vectors. In this work, we mainly focus on the distributed word vectors, also called as word-embedding. Distributed vector representation tend to give low dimensional realvalued vector representation of each word. Distributed word embedding techniques are mainly based on the work of Bengios neural probabilistic language model [Bengio et al., 2003]. II. RELATED WORK Several approaches have been proposed in the literature to address Bag Of Word. However most of them required either complex linguistic knowledge and extra human annotations. Tasks in sentiment analysis can be divided into four types based on the levels of granularity: document-level, sentence-level, phrase-level, and aspect-level sentiment analysis. Focusing on the phrase/subsentence-and aspect-level sentiment analysis, Wilson etal.[1]introduced effects of polarity shift. They began with a lexicon of words with established prior polarities, and identify the contextual polarity of phrases, based on some annotations. Choi and Cardie [2] further combined different type of negators with lexical polarity items though various compositional semantic models, both heuristic and machine learned, to improved sub sentential sentiment analysis. Nakagawa et al. [3] developed a semi-supervised model for sub sentential sentiment analysis that predicts polarity based on the interactions between nodes in dependency graphs,which potentially can induce the scope of negation. In aspect-level sentiment analysis, the polarity shift problem was considered in both corpus- and lexicon based methods[4],[5]. There are two main types of methods in the literature for document and sentiment-level sentiment classification. That are term-counting and machine learning methods. In term counting methods, the overall orientation of a text is given by summing up the orientation scores of content words in the text, based on manually-collected lexical resources. In machine learning methods, sentiment classification is regarded as a classification problem, where a text is represented by a bag-of words. Then, the supervised machine learning algorithms are applied as classifier [6]. The handling of polarity shift also differs in the two types of methods. The term-counting methods can be improved to include polarity shift. One way is to directly reverse the sentiment of polarity-shifted words, and then that can be sum up the sentiment score word by word. The machine learning methods are more widely used in the sentiment classification researches. However, it is relatively very complicated to integrate the polarity shift information into the BOW model in machine learning methods. For example, Das and Chen [7] designed a model by simply attaching NOT to words in the scope of negation, so that in the text I dont like movie, the word like becomes a new word like NOT. Yet Pang et al. [11] disclosed that this method only has slightly poor effects on improving the performance of sentiment classification accuracy. There were some attempts to model polarity shift by choosing more linguistic features or lexical resources. For example, Na et al. [8] introduced to model negation by looking for clear-cut part-of-speech tag patterns. Kennedy and Inkpen [9] suggested to use syntactic parsing to capture three class of valence shifters (negative, intensifiers, and diminishers). Their results showed that handling polarity shift raised the performance of term-counting systems significantly, but the improvements upon the @IJCTER-2016, All rights Reserved 595

baselines of machine learning systems are very slight (less than 1 percent). Ikeda et al. [10] designed a machine learning method based on a lexical dictionary extracted from General Inquirer 1 to model polarity-shifters for both word-wise and sentence-wise sentiment classification. III. WORD EMBEDDING The Word2vec model proposed by Tomas Mikolov. It dif-fers from other distributed representation mainly due to the removal of non-linear hidden layer, which made the reduction in computational complexity. Word2vec generates word vector by two different schemes of language modeling: Continuous bag of words (CBOW) and Skip-gram. A. Continuous Bag Of Word Model In Continuous bag of words (CBOW), for a given context size c, we are trying to predict the vector representation for the center word w t given its context words For eg. consider the sentence i love playing pranks on my friends, output word will be pranks for the given context words i, love, playing, on, my, friends. In Word2vec model every words are represented by two vector representations, inner word vector and outer word vector. Inner word vectors are used as the vector representation for the input of the model and outer word vectors are used as the vector representation for the output word in the model. CBOW models expects probability of dot product between the average of context word s inner word vector and outer word vector of the center words is greater than probability of dot product between the average of context word s inner word vector and outer word vector of all other model. B. Skip-gram Model We can say skip-gram model is opposite of CBOW model. Where in CBOW method, the goal is to predict a word given the surrounding words, whereas in skip-gram, given word predicts the words in surrounding with in the context. Skip-gram model proposed by Tomas Mikolov. In skip-gram model words are also represented by two vector representation, inner word vector and outer word vector. IV. IMPLEMENTATION DETAILS The experiment is conducted on Ubuntu operating system in a pipelined manner using Python. There are mainly two stages, namely data preprocessing and learning. The output of the data preprocessing phase will be given as the input for the learning phase. Data processing stage at which the input data is converted to its dual form and the learning stage at which it is used for learning of both original data and its dual form. Python have a great role in this work. No need to install or configure anything else to use python in Linux. SCIKIT LEARN and NLTK are the packages used in Python for the implementation. V. RESULTS We use Movie Reviwes as the data set for our experiment. we take 50000 samples to create the word vector and per-formed data preprocessing of the samples for the removal of un-necessary symbols like html tags and non-alphabetic characters. Then we calculated wordvectors for each word invocabulary. Freely available word2vec package are used for learning the word vector. Then calculated word vectors for different dimensions 25, 50, 100. We evaluated the performance of bag of word with sentiment analysis,word vector with sentiment analysis and bag of word with word vector. Here we use logistic regression as the learning algorithm and calculated precision, recall and fscore of both positive sample and negative sample. The results as shown below. @IJCTER-2016, All rights Reserved 596

Fig. 1. Precision efficiency of positive sentiment Fig. 2. Precision efficiency of positive sentiment VI. CONCLUSION AND FUTURE WORK Sentiment Analysis is used to determine the subjective attitude in the given text. The BOW model is used represent the feature vector in the sentiment analysis. The BOW model work well with sentiment analysis. Due to some limitation of BOW model, we introduced a different feature vector called Word2Vec. Here, we evlauated the performance of Word2Vec in the sentiment analysis and also we combined the BOW model with Word2Vec as the faeture vector used in sentiment analysis. We evaluated the performance of these features from difference approaches on classification algorithm as Logestic Regression. The result shows that the performance of Word2Vec is better than BOW model. The scope for the future work relies on the GloVe model with sentiment analysis. @IJCTER-2016, All rights Reserved 597

GloVe stands for Global Vectors. In some sense, GloVe can be seen as a hybrid approach, where it considers global context (by considering co-occurrence matrix) as well as local context (such as skipgram model) of words. REFERENCES [1] Whilson et al., Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis, Comput. Linguistics vol. 35, no. 3, pp. 399433, 2009. [2] Y. Choi and C. Cardie, Learning with compositional semantics as structural inference for subsentential sentiment analysis, in Proc. Conf.EmpiricalMethodsNaturalLanguageProcess vol. 6, pp. 100-101, 2006. [3] T. Nakagawa, K. Inui, and S. Kurohashi, Dependency tree-based sen-timent classi?cation using CRFs with hidden variables, Pacific Asia Conference on Language, Information and Computation vol. 82, no. 1, 35 45, 2012. [4] X. Ding and B. Liu, The utility of linguistic rules in opinion mining, in Proc. 30th ACM SIGIR Conf. Res. Development Inf. Retrieval, vol. 32, no. 11, pp. 58 63, September 2010. [5] Rui Xia et al., Dual Sentiment Analysis:Considering Two sides of One Review, IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 97, pp. 425-429, 2015. [6] Pang and Lee, Tumbs up? Sentiment Classification using Machine Learning Techniques, Proc. Conf. Empirical Methods Natural Language 79-86, 2002. [7] Ikeda et al., Learning to Shift the Polarity of Words for Sentiment Classification, Computational Intelligence vol. 6, pp. 100-101, 2006. [8] Rui and Huang, Determining the sentiment of opinions, Pacific Asia Conference on Language, Information and Computation vol. 82, no. 1, 35 45, 2012. [9] Soushan et al., Sentiment Classification and Polarity Shifting, Interna-tional Conference on Computational Liuguistics, vol. 32, no. 11, pp. 58 63, September 2010. [10] Rui Xia et al., Dual Sentiment Analysis:Considering Two sides of One Review, IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 97, pp. 425-429, 2015. @IJCTER-2016, All rights Reserved 598