Sentiment Analysis and Visualization of Social Media Data
|
|
- Nancy Lester
- 6 years ago
- Views:
Transcription
1 Sentiment Analysis and Visualization of Social Media Data The #BostonMarathon #Bombings test case Amir Salarpour Department of Computer Engineering Bu-Ali Sina University Hamedan, Iran Mohammad Hossein Bamneshin Department of Computer Engineering Bu-Ali Sina University Hamedan, Iran Dimitris Proios Department of Information and Telematics Harakopio University of Athens Athens, Greece Abstract This work aims a) to perform sentiment analysis on social media data using Machine Learning methods and b) to propose a user-friendly visualization of these data. Fig. 2. Sentiment analysis flow Keywords sentiment analysis, data visualization, machine learning I. INTRODUCTION (Heading 1) The target of this project is a) to perform sentiment analysis (SA) on social media text data (e.g. product reviews, tweets) using machine learning algorithms and b) to create a visualization summary of these data taking also into account the SA output. Fig. 1. Overall system s workflow II. DATASET AND ANNOTATION A. Dataset Description The following two datasets were used for the experiments: To perform sentiment analysis we pre-processed the text data using GATE (General Architecture for Text Engineering). The output was used to construct feature vectors (feature extraction) which in turn were used to train several machine learning models. Finally, the learnt models were evaluated in terms of accuracy, which is the proportion of test instances (reviews) that were classified in the correct category. For the visualization task we used D3 (Data-Driven Documents) a JavaScript library that provides functionality to display data in graphical charts. Product Reviews Corpus (PRC). The PRC is a part of Wishful Expressions Corpora [1]. It contains 1235 sentences with customer product reviews from Amazon.com and cnet.com, collected by Bing Liu 1 and his colleagues, and used in several publications. Two examples of such reviews are the following: i will never buy their product again at this rate and neither should you the product has worked perfectly for me on my xp This corpus was annotated by ILSP 2 regarding to specific sentiment categories (see below section 2.2) Boston Corpus (BC) has been collected and annotated by ILSP 3. It contains 5000 tweets related to the Marathon event and the bombings that took place in Boston on 15/4/2013. The Marathon started at 09:00 and the explosions occurred at 14:59. The tweets were collected in the timeframe between 14/4/2013 at 21:00 1 The original data and publications can be found at 2 Institute for Language and Speech Processing, Athena R.C. 3 This corpus cannot be distributed without the permission of ILSP. Any questions on this data should be directed to Haris Papageorgiou (xaris@ilsp.gr)
2 am and 15/04/2013 at 19:46 am. Some examples of these tweets are the following: Excited for the #bostonmarathon tomorrow #makeitcount #findgreatness Best of luck to all those running #bostonmarathon today! Have FUN and enjoy!! #bostonmarathon explosions! Horrifying site :( let's pray for the may affected #PrayersforBoston our prayers go out to Boston this afternoon. We randomly selected and annotated a 1027 tweets subset. This sub-corpus was initially used to test the best classifier that was trained on the product reviews corpus. In a second phase, it was combined with the product reviews corpus in order to train the final sentiment model. B. Annotation and Inter-Annotator Agreement (IAA) Product Reviews Corpus: Each sentence of the PRC was judged 4 by two annotators regarding to the following categories: Subjective: any text about private states (sentiments, opinions, emotions, feelings, thoughts, beliefs etc.) expressed by an author. Objective: text that contains only factual information. Positive: any text containing positive opinions, emotions, feelings etc. or facts/events that may trigger positive sentiment. Negative: any text containing negative opinions, emotions, feelings etc. or facts/events that may trigger negative sentiment. Praise: positive evaluations and opinions about specific entities or topics and their aspects explicitly or implicitly expressed by an author. Criticism: negative evaluations and opinions about specific entities or topics and their aspects explicitly or implicitly expressed by an author. To assess the agreement of the two annotators we used Cohen's kappa coefficient and PSA (Proportion of Specific Agreement). The results are shown in Table 1. TABLE I. INTER-ANNOTATOR AGREEMENT Method Sub Obj Pos Neg Prais Crit Kappa ,6785 0,5690 PSA for 0-label ,8749 0,9071 PSA for 1-label ,8029 0,6616 When the number of each class instances are different, kappa coefficient will be unreal. So we used PSA coefficient that shows the agreement separately. For the subjective category we notice that the agreement for 1-label class is high but not for the 0-label one and vice versa for the objective category. In contrast, the results show high agreement in the positive and negative categories. The IAA is also high for the praise category. In the criticism category the agreement for the 0-label class is high (0.9071) and for the 1-label (0.6616). However, this asymmetry between these two criticism classes results in a low Kappa. We decided to focus on the positive and negative categories, where the IAA was substantial. In total, the two annotators disagreed in 300 sentences. These conflicts were resolved by an expert annotator in order to create a reliable corpus. Boston Corpus: The randomly selected tweets subset was annotated similarly to the product reviews corpus by a domain expert from ILSP 6. Again, we focus on the positive and negative categories. III. FEATURES AND DATA PRE-PROCESSING A. Features ILSP provided two types of features for the sentiment experiments: Lexical features assigned to each token of a text after being pre-processed by applying a custom Natural Language Processing pipeline in GATE (see below 3.3). Examples of such features are the part-of-speech tags, punctuation, orthography etc. Sentiment Lexicon-based features resulting from the following lexica: Opinion Lexicon [2, 3]: It contains 4783 negative and 2006 positive opinion words, namely words used from writers and speakers in order to express their opinions toward some target. ANEW (Affective Norms of English Words) [4]: contains 1034 words with valence, arousal and dominance scores. The specific words had been previously identified as bearing meaningful emotional content. Lexicon [5]: It contains 4075 attitude words classified in particular categories using specific syntactic and semantic criteria. is examined inside the scope of Appraisal Theory [6] and the specific words are considered as a linguistic device to express evaluations (criticism or praise) toward some target. Intention Lexicon 7 : It contains 355 words and expressions used to express future intentions such as commitments, promises, desires etc. Each entry is classified in particular categories using specific syntactic and semantic criteria. B. Data Pre-processing The dataset was pre-processed using GATE (General Architecture for Text Engineering) by applying the following pipeline: 4 The annotated corpus cannot be distributed without the permission of ILSP. 5 After removing the duplicates (re-tweets) we ended up to a corpus of 776 distinct tweets. 6 The annotated corpus cannot be distributed without the permission of ILSP. 7 This lexicon is being developed by ILSP (Pontiki Maria, Thanasis Kalogeropoulos and Haris Papageorgiou) and is not published yet.
3 Fig. 3. NLP pipeline TABLE II. FEATURES PROPERTIES Min Max Mean Mod Variance Range Number of Category Number of NN Number of JJ Number of VB Number of RB Number of W Number of Negations Number of Opinion Number of Positive Opinion The final output of the GATE pre-processing for each text is stored in an XML file. IV. SENTIMENT ANALYSIS USING MACHINE LEARNING A. Experiments on product reviews dataset We split the 1235 reviews of the PRC to two parts; the 70% was used for training and the 30% for testing. 1) Feature extraction We parsed each GATE xml file (using MATLAB) and we calculated the following 32 features for each review: 1- Number of different category there were in each sentence. 2- Number of NN tags. 3- Number of JJ tags. 4- Number of VB tags. 5- Number of RB tags. 6- Number of W tags. 7- Number of negation words. 8- Number of words detected by Opinion Lexicon. 9- Number of negative words detected by Opinion Lexicon. 10- Number of positive words detected by Opinion Lexicon. 11- Number of words detected by Lexicon. 12- Number of negative words detected by Lexicon. 13- Number of positive words detected by Lexicon. 14- Number of both words detected by Lexicon. 15- Number of JJ words detected by Lexicon. 16- Number of NN words detected by Lexicon. 17- Number of RB words detected by Lexicon. 18- Number of negative and JJ words detected by Lexicon. 19- Number of negative and NN words detected by Lexicon. 20- Number of negative and RB words detected by Lexicon. 21- Number of positive and JJ words detected by Lexicon. 22- Number of positive and NN words detected by Lexicon. 23- Number of positive and RB words detected by Lexicon. 24- Number of both and JJ words detected by Lexicon. 25- Number of both and NN words detected by Lexicon. 26- Number of both and RB words detected by Lexicon. 27- Average of Valence Mean for words covered by ANEW Lexicon. 28- Average of Dominance Mean for words covered by ANEW Lexicon. 29- Average of Arousal Mean for words covered by ANEW Lexicon. 30- Number of discovered Desire word using Intention Lexicon. 31- Number of discovered Commitment word using Intention Lexicon. 32- Number of discovered Purpose word using Intention Lexicon. In Table 2 we present a statistical analysis for the extracted features: Number of Negative Opinion Number of Number of Negative Number of Positive Number of Both Number of JJ Number of NN Number of RB Number of JJ Negative Number of NN Negative Number of RB Negative Number of JJ Positive Number of NN Positive Number of RB Positive Number of JJ Both Number of NN Both Number of RB Both Valence Average Dominance Average Arousal Average Number of Desire Number of Commitment Number of Purpose
4 2) Experiments For the sentiment analysis task several experiments were conducted using different machine learning (ML) algorithms. To evaluate the learnt models we used accuracy which is defined as the number of correctly classified instances divided with the total number of instances. In our models we kept all the features listed in previous section since the feature selection experiments we run using various methods (Information Gain, Mutual Information and Ranking Algorithm) didn t show any improvements. As a baseline system we used a majority classifier which always chooses as the correct category the one that is more frequent in the training data. The accuracy of this method also indicates the level of difficulty of the task. Below we present the experiments we have done on product reviews dataset using various ML methods: K-Nearest Neighbors (k-nn): We tried k-nn a simple ML algorithm that classifies a test instance to the majority category of its k nearest training examples. We tried a wide range of k values (k =1,,100) and we found using cross validation on the training set that the optimal one is 21. As a distance measure between feature vectors we used Euclidean distance. We also used Principal Component Analysis (PCA) to remove correlations between features. PCA improves accuracy for both positive and negative categories (see Table 4 and 5). Naïve Bayes: Another well-known ML algorithm is Naïve Bayes (NB). NB assumes that feature variables (x 1,, x n ) are independent given the class c (category). The distributions of these variables P(x i c ) are estimated from the training data. When a test instance defined by its feature vector x is given to NB, it classifies it to the class c that has the highest P(c x). The estimation of the latter probability is estimated using Bayes Theorem and learnt P(x i c ) probabilities. SVM with MLP or RBF kernel: We also tried Support Vector Machines (SVM) which attempt to learn a separating hyperplane for the given classes (categories) from the training data. We experimented with Multilayer Perceptron kernel (MLP) and Radial Basis Function (RBF) kernel. The best accuracy was obtained using RBF kernel. We tuned model parameters on training set using Genetic Algorithms (GA) which significantly improved accuracy for both kernels (see Table 4 and 5). As previously we used PCA to remove correlation between features. The model with the best results is SVM with RBF which we use in our experiments with twitter data. SVM MLP + PCA SVM MLP + PCA SVM RBF + PCA SVM RBF + PCA Naïve Bayes Naïve Bayes + PCA Naïve Bayes + Naïve Bayes + + PCA TABLE IV ACCURACY OF DIFFERENT METHODS FOR POSITIVE CATEGORY Train Method Test Cross-validation Full training set in training set Baseline k-nn (k=21) K-NN + PCA (k=21) SVM MLP + PCA SVM MLP + PCA SVM RBF + PCA SVM RBF + PCA Naïve Bayes Naïve Bayes + PCA Naïve Bayes + Naïve Bayes + + PCA We also wanted to assess how better our models predict the target (category) as we increase the number of training instances. So, we build models using 10%, 20%,, 100% of the training set and we evaluated them on the test set. We show the results we obtained in Figures 3, 4, 5, and 6 for negative category and in Figures 7, 8, 9 and 10 for the positive category. K-NN with PCA, SVM with RBF or MLP kernel using parameter and PCA, and Naïve Bayes using parameter and PCA improve their accuracy as we add more training data. Fig. 4. KNN Learning curve comparison for using PCA and not using it KNN Learning Curve on Negative Class TABLE III. ACCURACY OF DIFFERENT METHODS FOR NEGATIVE CATEGORY Train Method Test Cross-validation Full training set in training set Baseline k-nn (k=21) K-NN + PCA (k=21) non pca
5 Fig. 5. Naive Bayes Learning Curves - comparison for using and not using PCA and Parameter Naive Bayes Learning Curve on Negative Class Fig. 9. Naive Bayes Learning Curves - comparison for using and not using PCA and Parameter Naive Bayes Learning Curve on Positive Class 0.6 Fig. 6. SVM-RBF learning Curve using PCA compared with same method with tuned parameters optimized and non pca optimized and non pca Fig. 7. SVM-MLP Learning Curve using PCA and compared with same method with tuned parameters SVM-RBF Learning Curve on Negative Class optimized and Fig. 8. KNN Learning curve comparison for using PCA and not using it 0.9 SVM-MLP Learning Curve on Negative Class optimized and KNN Learning Curve on Positive Class Fig. 10. SVM-RBF learning Curve using PCA compared with same method with tuned parameters non pca optimized and optimized and non pca Fig. 11. SVM-MLP Learning Curve using PCA and compared with same method with tuned parameters Learning Curve SVM-RBF for Positive class Using PCA Optimised using PCA SVM-MLP Learning Curve on Positive Class optimized and non pca
6 B. Experiments on BC We used our best models (SVM RBF + PCA + ) trained on the 70% of the product reviews and evaluate them on the 776 tweets. The models achieve 64.1% and 77.7% for the positive and negative class, respectively. Both outperform the corresponding majority baselines. A. Hashtag graph This one presents the most important (frequent) discussed topics of the 3963 tweets using a D3 bubble chart. The biggest bubbles correspond to more frequently discussed topics. Fig. 13. General Hashtags Bubble Chart TABLE V. SVM RBF + PCA + TUNING method positive class negative class Majority baseline SVM RBF + PCA We also created a training set using the 70% of product reviews and the 70% of the labeled twitter data. Similarly, we created a test set by combining the remaining 30% of the two aforementioned datasets. We then trained models by progressively adding more training data as in the previous section. As it is shown in Figure 11 our models achieve better accuracy as more training instances are added. Fig. 12. Learning Curve of SVM-RBF using GA on combined dataset SVM-RBF kernel using GA Learning Curve on Combined CORPUS Fig. 14. General Hashtags Bubble Chart using logarithmic scale for radial size Positive Label Negative Label In the following Table we show the accuracy of our classifier trained using the 100% of training set. This model was used to classify the remaining 3963 tweets from the BC and the output was fed to the data visualization algorithm. TABLE VI. SVM RBF + PCA + TUNING Method Positive Class Negative Class Majority SVM RBF + PCA V. DATA VISUALIZATION Two types of data graph visualizations are presented: As seen in the above figures, each topic corresponds to a set of twitter hashtags whose names arise one from another using minor lexical or stylistic transformations (e.g. BostonMarathon, bostonmarathon ). These hashtags are detected using simple heuristics and/or Levenshtein (edit) distance. A graph with the consolidated hashtags is shown below:
7 Fig. 15. Bubble Chart after the hashtag consolidation Fig. 17. The distribution of positive and negative tweets on a four hour time frame Fig. 18. The distribution of positive and negative tweets per hour Fig. 16. Bubble Chart after the hashtag consolidation using logarithmic scale for radial size B. Sentiment Graph It presents the frequency of the positive and negative tweets over time. As shown below (Figure 14 and 15) the number of tweets for the Boston Marathon was relatively small in the beginning, however, after bomb explosion it was rapidly increased. As it also shown the negative tweets dominated over the positive ones as time was passing since more people expressed its sadness or anger about the event. The fact that many positive tweets are detected (as shown in the graph) is mainly due to many people express hopes and wishes (e.g. Best wishes to those at #BostonMarathon, I hope everyone is ok ). VI. CONCLUSIONS AND FUTURE WORK We have experimented with a variety of well-known machine learning algorithms that were used to predict the expression of positive or negative sentiments on social media data. We have shown that a Support Vector Machine with RBF kernel has obtained the best results for both categories on a dataset of product reviews. We have also shown that same classifier has competitive results on a different domain (twitter dataset). We also created using D3 JavaScript library a concise visualization summary of the data. This visualization presents in a user friendly way a) the most important topics discussed and b) the dominant sentiment expressed in the data over time. In future work we plan to assess the effectiveness of each lexicon and to test different feature sets and machine learning algorithms (e.g. Logistic Regression). In addition, we would to perform an error analysis to detect the cases that our classifier fails to predict correct sentiment. Furthermore, a more sophisticated visualization is planned in which we will present the dominant topics per time unit separately for each sentiment category.
8 REFERENCES [1] Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson and Xiaojin Zhu. May All Your Wishes Come True: A Study of Wishes and How to Recognize Them. Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2009). [2] Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA, [3] Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan. [4] Bradley, M., & Lang, P. (1999). Affective norms for english words (anew): Stimuli, instruction manual and affective ratings. Technical report c-1, Gainesville, FL: University of Florida [5] Pontiki Maria, Aggelou Zoe, Maltezou Sofia & Papageorgiou Haris (2013). Sentiment Analysis: Building Bilingual Lexical Resources. To be published in the Proceedings of the 13th International conference on Greek Linguistics, September 26-29, 2013 [6] Martin, J.R. and White, P.R.R. (2005). The Language of Evaluation, Appraisal in English, Palgrave Macmillan, London & New York.
Assignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationTwitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationPsycholinguistic Features for Deceptive Role Detection in Werewolf
Psycholinguistic Features for Deceptive Role Detection in Werewolf Codruta Girlea University of Illinois Urbana, IL 61801, USA girlea2@illinois.edu Roxana Girju University of Illinois Urbana, IL 61801,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationSyntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews
Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy
More informationTextGraphs: Graph-based algorithms for Natural Language Processing
HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationГлубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках
Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,
More informationAnalyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio
SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationExtracting Verb Expressions Implying Negative Opinions
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Extracting Verb Expressions Implying Negative Opinions Huayi Li, Arjun Mukherjee, Jianfeng Si, Bing Liu Department of Computer
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationHeuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger
Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS
More informationWriting a Basic Assessment Report. CUNY Office of Undergraduate Studies
Writing a Basic Assessment Report What is a Basic Assessment Report? A basic assessment report is useful when assessing selected Common Core SLOs across a set of single courses A basic assessment report
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationDisambiguation of Thai Personal Name from Online News Articles
Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online
More informationCross-Lingual Text Categorization
Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationA Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique
A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationA Vector Space Approach for Aspect-Based Sentiment Analysis
A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationMovie Review Mining and Summarization
Movie Review Mining and Summarization Li Zhuang Microsoft Research Asia Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China f-lzhuang@hotmail.com Feng Jing Microsoft Research
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationPostprint.
http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationLarge-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy
Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010
More informationTime series prediction
Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing
More informationMULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY
MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationImproving Machine Learning Input for Automatic Document Classification with Natural Language Processing
Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing Jan C. Scholtes Tim H.W. van Cann University of Maastricht, Department of Knowledge Engineering.
More informationMeasuring the relative compositionality of verb-noun (V-N) collocations by integrating features
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationArticle A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek
Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek Vasileios Athanasiou and Manolis Maragoudakis * Artificial
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationText-mining the Estonian National Electronic Health Record
Text-mining the Estonian National Electronic Health Record Raul Sirel rsirel@ut.ee 13.11.2015 Outline Electronic Health Records & Text Mining De-identifying the Texts Resolving the Abbreviations Terminology
More informationTruth Inference in Crowdsourcing: Is the Problem Solved?
Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer
More informationMultivariate k-nearest Neighbor Regression for Time Series data -
Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationMatching Similarity for Keyword-Based Clustering
Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web
More informationarxiv: v1 [cs.lg] 3 May 2013
Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationDetecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011
Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,
More informationGenre classification on German novels
Genre classification on German novels Lena Hettinger, Martin Becker, Isabella Reger, Fotis Jannidis and Andreas Hotho Data Mining and Information Retrieval Group, University of Würzburg Email: {hettinger,
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationExposé for a Master s Thesis
Exposé for a Master s Thesis Stefan Selent January 21, 2017 Working Title: TF Relation Mining: An Active Learning Approach Introduction The amount of scientific literature is ever increasing. Especially
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationEmotions from text: machine learning for text-based emotion prediction
Emotions from text: machine learning for text-based emotion prediction Cecilia Ovesdotter Alm Dept. of Linguistics UIUC Illinois, USA ebbaalm@uiuc.edu Dan Roth Dept. of Computer Science UIUC Illinois,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationCLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH
ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department
More informationP. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas
Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationChapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard
Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.
More information