Sentiment Analysis and Visualization of Social Media Data

Size: px
Start display at page:

Download "Sentiment Analysis and Visualization of Social Media Data"

Transcription

1 Sentiment Analysis and Visualization of Social Media Data The #BostonMarathon #Bombings test case Amir Salarpour Department of Computer Engineering Bu-Ali Sina University Hamedan, Iran Mohammad Hossein Bamneshin Department of Computer Engineering Bu-Ali Sina University Hamedan, Iran Dimitris Proios Department of Information and Telematics Harakopio University of Athens Athens, Greece Abstract This work aims a) to perform sentiment analysis on social media data using Machine Learning methods and b) to propose a user-friendly visualization of these data. Fig. 2. Sentiment analysis flow Keywords sentiment analysis, data visualization, machine learning I. INTRODUCTION (Heading 1) The target of this project is a) to perform sentiment analysis (SA) on social media text data (e.g. product reviews, tweets) using machine learning algorithms and b) to create a visualization summary of these data taking also into account the SA output. Fig. 1. Overall system s workflow II. DATASET AND ANNOTATION A. Dataset Description The following two datasets were used for the experiments: To perform sentiment analysis we pre-processed the text data using GATE (General Architecture for Text Engineering). The output was used to construct feature vectors (feature extraction) which in turn were used to train several machine learning models. Finally, the learnt models were evaluated in terms of accuracy, which is the proportion of test instances (reviews) that were classified in the correct category. For the visualization task we used D3 (Data-Driven Documents) a JavaScript library that provides functionality to display data in graphical charts. Product Reviews Corpus (PRC). The PRC is a part of Wishful Expressions Corpora [1]. It contains 1235 sentences with customer product reviews from Amazon.com and cnet.com, collected by Bing Liu 1 and his colleagues, and used in several publications. Two examples of such reviews are the following: i will never buy their product again at this rate and neither should you the product has worked perfectly for me on my xp This corpus was annotated by ILSP 2 regarding to specific sentiment categories (see below section 2.2) Boston Corpus (BC) has been collected and annotated by ILSP 3. It contains 5000 tweets related to the Marathon event and the bombings that took place in Boston on 15/4/2013. The Marathon started at 09:00 and the explosions occurred at 14:59. The tweets were collected in the timeframe between 14/4/2013 at 21:00 1 The original data and publications can be found at 2 Institute for Language and Speech Processing, Athena R.C. 3 This corpus cannot be distributed without the permission of ILSP. Any questions on this data should be directed to Haris Papageorgiou (xaris@ilsp.gr)

2 am and 15/04/2013 at 19:46 am. Some examples of these tweets are the following: Excited for the #bostonmarathon tomorrow #makeitcount #findgreatness Best of luck to all those running #bostonmarathon today! Have FUN and enjoy!! #bostonmarathon explosions! Horrifying site :( let's pray for the may affected #PrayersforBoston our prayers go out to Boston this afternoon. We randomly selected and annotated a 1027 tweets subset. This sub-corpus was initially used to test the best classifier that was trained on the product reviews corpus. In a second phase, it was combined with the product reviews corpus in order to train the final sentiment model. B. Annotation and Inter-Annotator Agreement (IAA) Product Reviews Corpus: Each sentence of the PRC was judged 4 by two annotators regarding to the following categories: Subjective: any text about private states (sentiments, opinions, emotions, feelings, thoughts, beliefs etc.) expressed by an author. Objective: text that contains only factual information. Positive: any text containing positive opinions, emotions, feelings etc. or facts/events that may trigger positive sentiment. Negative: any text containing negative opinions, emotions, feelings etc. or facts/events that may trigger negative sentiment. Praise: positive evaluations and opinions about specific entities or topics and their aspects explicitly or implicitly expressed by an author. Criticism: negative evaluations and opinions about specific entities or topics and their aspects explicitly or implicitly expressed by an author. To assess the agreement of the two annotators we used Cohen's kappa coefficient and PSA (Proportion of Specific Agreement). The results are shown in Table 1. TABLE I. INTER-ANNOTATOR AGREEMENT Method Sub Obj Pos Neg Prais Crit Kappa ,6785 0,5690 PSA for 0-label ,8749 0,9071 PSA for 1-label ,8029 0,6616 When the number of each class instances are different, kappa coefficient will be unreal. So we used PSA coefficient that shows the agreement separately. For the subjective category we notice that the agreement for 1-label class is high but not for the 0-label one and vice versa for the objective category. In contrast, the results show high agreement in the positive and negative categories. The IAA is also high for the praise category. In the criticism category the agreement for the 0-label class is high (0.9071) and for the 1-label (0.6616). However, this asymmetry between these two criticism classes results in a low Kappa. We decided to focus on the positive and negative categories, where the IAA was substantial. In total, the two annotators disagreed in 300 sentences. These conflicts were resolved by an expert annotator in order to create a reliable corpus. Boston Corpus: The randomly selected tweets subset was annotated similarly to the product reviews corpus by a domain expert from ILSP 6. Again, we focus on the positive and negative categories. III. FEATURES AND DATA PRE-PROCESSING A. Features ILSP provided two types of features for the sentiment experiments: Lexical features assigned to each token of a text after being pre-processed by applying a custom Natural Language Processing pipeline in GATE (see below 3.3). Examples of such features are the part-of-speech tags, punctuation, orthography etc. Sentiment Lexicon-based features resulting from the following lexica: Opinion Lexicon [2, 3]: It contains 4783 negative and 2006 positive opinion words, namely words used from writers and speakers in order to express their opinions toward some target. ANEW (Affective Norms of English Words) [4]: contains 1034 words with valence, arousal and dominance scores. The specific words had been previously identified as bearing meaningful emotional content. Lexicon [5]: It contains 4075 attitude words classified in particular categories using specific syntactic and semantic criteria. is examined inside the scope of Appraisal Theory [6] and the specific words are considered as a linguistic device to express evaluations (criticism or praise) toward some target. Intention Lexicon 7 : It contains 355 words and expressions used to express future intentions such as commitments, promises, desires etc. Each entry is classified in particular categories using specific syntactic and semantic criteria. B. Data Pre-processing The dataset was pre-processed using GATE (General Architecture for Text Engineering) by applying the following pipeline: 4 The annotated corpus cannot be distributed without the permission of ILSP. 5 After removing the duplicates (re-tweets) we ended up to a corpus of 776 distinct tweets. 6 The annotated corpus cannot be distributed without the permission of ILSP. 7 This lexicon is being developed by ILSP (Pontiki Maria, Thanasis Kalogeropoulos and Haris Papageorgiou) and is not published yet.

3 Fig. 3. NLP pipeline TABLE II. FEATURES PROPERTIES Min Max Mean Mod Variance Range Number of Category Number of NN Number of JJ Number of VB Number of RB Number of W Number of Negations Number of Opinion Number of Positive Opinion The final output of the GATE pre-processing for each text is stored in an XML file. IV. SENTIMENT ANALYSIS USING MACHINE LEARNING A. Experiments on product reviews dataset We split the 1235 reviews of the PRC to two parts; the 70% was used for training and the 30% for testing. 1) Feature extraction We parsed each GATE xml file (using MATLAB) and we calculated the following 32 features for each review: 1- Number of different category there were in each sentence. 2- Number of NN tags. 3- Number of JJ tags. 4- Number of VB tags. 5- Number of RB tags. 6- Number of W tags. 7- Number of negation words. 8- Number of words detected by Opinion Lexicon. 9- Number of negative words detected by Opinion Lexicon. 10- Number of positive words detected by Opinion Lexicon. 11- Number of words detected by Lexicon. 12- Number of negative words detected by Lexicon. 13- Number of positive words detected by Lexicon. 14- Number of both words detected by Lexicon. 15- Number of JJ words detected by Lexicon. 16- Number of NN words detected by Lexicon. 17- Number of RB words detected by Lexicon. 18- Number of negative and JJ words detected by Lexicon. 19- Number of negative and NN words detected by Lexicon. 20- Number of negative and RB words detected by Lexicon. 21- Number of positive and JJ words detected by Lexicon. 22- Number of positive and NN words detected by Lexicon. 23- Number of positive and RB words detected by Lexicon. 24- Number of both and JJ words detected by Lexicon. 25- Number of both and NN words detected by Lexicon. 26- Number of both and RB words detected by Lexicon. 27- Average of Valence Mean for words covered by ANEW Lexicon. 28- Average of Dominance Mean for words covered by ANEW Lexicon. 29- Average of Arousal Mean for words covered by ANEW Lexicon. 30- Number of discovered Desire word using Intention Lexicon. 31- Number of discovered Commitment word using Intention Lexicon. 32- Number of discovered Purpose word using Intention Lexicon. In Table 2 we present a statistical analysis for the extracted features: Number of Negative Opinion Number of Number of Negative Number of Positive Number of Both Number of JJ Number of NN Number of RB Number of JJ Negative Number of NN Negative Number of RB Negative Number of JJ Positive Number of NN Positive Number of RB Positive Number of JJ Both Number of NN Both Number of RB Both Valence Average Dominance Average Arousal Average Number of Desire Number of Commitment Number of Purpose

4 2) Experiments For the sentiment analysis task several experiments were conducted using different machine learning (ML) algorithms. To evaluate the learnt models we used accuracy which is defined as the number of correctly classified instances divided with the total number of instances. In our models we kept all the features listed in previous section since the feature selection experiments we run using various methods (Information Gain, Mutual Information and Ranking Algorithm) didn t show any improvements. As a baseline system we used a majority classifier which always chooses as the correct category the one that is more frequent in the training data. The accuracy of this method also indicates the level of difficulty of the task. Below we present the experiments we have done on product reviews dataset using various ML methods: K-Nearest Neighbors (k-nn): We tried k-nn a simple ML algorithm that classifies a test instance to the majority category of its k nearest training examples. We tried a wide range of k values (k =1,,100) and we found using cross validation on the training set that the optimal one is 21. As a distance measure between feature vectors we used Euclidean distance. We also used Principal Component Analysis (PCA) to remove correlations between features. PCA improves accuracy for both positive and negative categories (see Table 4 and 5). Naïve Bayes: Another well-known ML algorithm is Naïve Bayes (NB). NB assumes that feature variables (x 1,, x n ) are independent given the class c (category). The distributions of these variables P(x i c ) are estimated from the training data. When a test instance defined by its feature vector x is given to NB, it classifies it to the class c that has the highest P(c x). The estimation of the latter probability is estimated using Bayes Theorem and learnt P(x i c ) probabilities. SVM with MLP or RBF kernel: We also tried Support Vector Machines (SVM) which attempt to learn a separating hyperplane for the given classes (categories) from the training data. We experimented with Multilayer Perceptron kernel (MLP) and Radial Basis Function (RBF) kernel. The best accuracy was obtained using RBF kernel. We tuned model parameters on training set using Genetic Algorithms (GA) which significantly improved accuracy for both kernels (see Table 4 and 5). As previously we used PCA to remove correlation between features. The model with the best results is SVM with RBF which we use in our experiments with twitter data. SVM MLP + PCA SVM MLP + PCA SVM RBF + PCA SVM RBF + PCA Naïve Bayes Naïve Bayes + PCA Naïve Bayes + Naïve Bayes + + PCA TABLE IV ACCURACY OF DIFFERENT METHODS FOR POSITIVE CATEGORY Train Method Test Cross-validation Full training set in training set Baseline k-nn (k=21) K-NN + PCA (k=21) SVM MLP + PCA SVM MLP + PCA SVM RBF + PCA SVM RBF + PCA Naïve Bayes Naïve Bayes + PCA Naïve Bayes + Naïve Bayes + + PCA We also wanted to assess how better our models predict the target (category) as we increase the number of training instances. So, we build models using 10%, 20%,, 100% of the training set and we evaluated them on the test set. We show the results we obtained in Figures 3, 4, 5, and 6 for negative category and in Figures 7, 8, 9 and 10 for the positive category. K-NN with PCA, SVM with RBF or MLP kernel using parameter and PCA, and Naïve Bayes using parameter and PCA improve their accuracy as we add more training data. Fig. 4. KNN Learning curve comparison for using PCA and not using it KNN Learning Curve on Negative Class TABLE III. ACCURACY OF DIFFERENT METHODS FOR NEGATIVE CATEGORY Train Method Test Cross-validation Full training set in training set Baseline k-nn (k=21) K-NN + PCA (k=21) non pca

5 Fig. 5. Naive Bayes Learning Curves - comparison for using and not using PCA and Parameter Naive Bayes Learning Curve on Negative Class Fig. 9. Naive Bayes Learning Curves - comparison for using and not using PCA and Parameter Naive Bayes Learning Curve on Positive Class 0.6 Fig. 6. SVM-RBF learning Curve using PCA compared with same method with tuned parameters optimized and non pca optimized and non pca Fig. 7. SVM-MLP Learning Curve using PCA and compared with same method with tuned parameters SVM-RBF Learning Curve on Negative Class optimized and Fig. 8. KNN Learning curve comparison for using PCA and not using it 0.9 SVM-MLP Learning Curve on Negative Class optimized and KNN Learning Curve on Positive Class Fig. 10. SVM-RBF learning Curve using PCA compared with same method with tuned parameters non pca optimized and optimized and non pca Fig. 11. SVM-MLP Learning Curve using PCA and compared with same method with tuned parameters Learning Curve SVM-RBF for Positive class Using PCA Optimised using PCA SVM-MLP Learning Curve on Positive Class optimized and non pca

6 B. Experiments on BC We used our best models (SVM RBF + PCA + ) trained on the 70% of the product reviews and evaluate them on the 776 tweets. The models achieve 64.1% and 77.7% for the positive and negative class, respectively. Both outperform the corresponding majority baselines. A. Hashtag graph This one presents the most important (frequent) discussed topics of the 3963 tweets using a D3 bubble chart. The biggest bubbles correspond to more frequently discussed topics. Fig. 13. General Hashtags Bubble Chart TABLE V. SVM RBF + PCA + TUNING method positive class negative class Majority baseline SVM RBF + PCA We also created a training set using the 70% of product reviews and the 70% of the labeled twitter data. Similarly, we created a test set by combining the remaining 30% of the two aforementioned datasets. We then trained models by progressively adding more training data as in the previous section. As it is shown in Figure 11 our models achieve better accuracy as more training instances are added. Fig. 12. Learning Curve of SVM-RBF using GA on combined dataset SVM-RBF kernel using GA Learning Curve on Combined CORPUS Fig. 14. General Hashtags Bubble Chart using logarithmic scale for radial size Positive Label Negative Label In the following Table we show the accuracy of our classifier trained using the 100% of training set. This model was used to classify the remaining 3963 tweets from the BC and the output was fed to the data visualization algorithm. TABLE VI. SVM RBF + PCA + TUNING Method Positive Class Negative Class Majority SVM RBF + PCA V. DATA VISUALIZATION Two types of data graph visualizations are presented: As seen in the above figures, each topic corresponds to a set of twitter hashtags whose names arise one from another using minor lexical or stylistic transformations (e.g. BostonMarathon, bostonmarathon ). These hashtags are detected using simple heuristics and/or Levenshtein (edit) distance. A graph with the consolidated hashtags is shown below:

7 Fig. 15. Bubble Chart after the hashtag consolidation Fig. 17. The distribution of positive and negative tweets on a four hour time frame Fig. 18. The distribution of positive and negative tweets per hour Fig. 16. Bubble Chart after the hashtag consolidation using logarithmic scale for radial size B. Sentiment Graph It presents the frequency of the positive and negative tweets over time. As shown below (Figure 14 and 15) the number of tweets for the Boston Marathon was relatively small in the beginning, however, after bomb explosion it was rapidly increased. As it also shown the negative tweets dominated over the positive ones as time was passing since more people expressed its sadness or anger about the event. The fact that many positive tweets are detected (as shown in the graph) is mainly due to many people express hopes and wishes (e.g. Best wishes to those at #BostonMarathon, I hope everyone is ok ). VI. CONCLUSIONS AND FUTURE WORK We have experimented with a variety of well-known machine learning algorithms that were used to predict the expression of positive or negative sentiments on social media data. We have shown that a Support Vector Machine with RBF kernel has obtained the best results for both categories on a dataset of product reviews. We have also shown that same classifier has competitive results on a different domain (twitter dataset). We also created using D3 JavaScript library a concise visualization summary of the data. This visualization presents in a user friendly way a) the most important topics discussed and b) the dominant sentiment expressed in the data over time. In future work we plan to assess the effectiveness of each lexicon and to test different feature sets and machine learning algorithms (e.g. Logistic Regression). In addition, we would to perform an error analysis to detect the cases that our classifier fails to predict correct sentiment. Furthermore, a more sophisticated visualization is planned in which we will present the dominant topics per time unit separately for each sentiment category.

8 REFERENCES [1] Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson and Xiaojin Zhu. May All Your Wishes Come True: A Study of Wishes and How to Recognize Them. Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT 2009). [2] Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA, [3] Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan. [4] Bradley, M., & Lang, P. (1999). Affective norms for english words (anew): Stimuli, instruction manual and affective ratings. Technical report c-1, Gainesville, FL: University of Florida [5] Pontiki Maria, Aggelou Zoe, Maltezou Sofia & Papageorgiou Haris (2013). Sentiment Analysis: Building Bilingual Lexical Resources. To be published in the Proceedings of the 13th International conference on Greek Linguistics, September 26-29, 2013 [6] Martin, J.R. and White, P.R.R. (2005). The Language of Evaluation, Appraisal in English, Palgrave Macmillan, London & New York.

Assignment 1: Predicting Amazon Review Ratings

Assignment 1: Predicting Amazon Review Ratings Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Learning From the Past with Experiment Databases

Learning From the Past with Experiment Databases Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University

More information

CS 446: Machine Learning

CS 446: Machine Learning CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Python Machine Learning

Python Machine Learning Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled

More information

Reducing Features to Improve Bug Prediction

Reducing Features to Improve Bug Prediction Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science

More information

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com

More information

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering

More information

A Comparison of Two Text Representations for Sentiment Analysis

A Comparison of Two Text Representations for Sentiment Analysis 010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational

More information

Lecture 1: Machine Learning Basics

Lecture 1: Machine Learning Basics 1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3

More information

Indian Institute of Technology, Kanpur

Indian Institute of Technology, Kanpur Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar

More information

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.

More information

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United

More information

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX

More information

Australian Journal of Basic and Applied Sciences

Australian Journal of Basic and Applied Sciences AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

CS Machine Learning

CS Machine Learning CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Psycholinguistic Features for Deceptive Role Detection in Werewolf

Psycholinguistic Features for Deceptive Role Detection in Werewolf Psycholinguistic Features for Deceptive Role Detection in Werewolf Codruta Girlea University of Illinois Urbana, IL 61801, USA girlea2@illinois.edu Roxana Girju University of Illinois Urbana, IL 61801,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews

Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy

More information

TextGraphs: Graph-based algorithms for Natural Language Processing

TextGraphs: Graph-based algorithms for Natural Language Processing HLT-NAACL 06 TextGraphs: Graph-based algorithms for Natural Language Processing Proceedings of the Workshop Production and Manufacturing by Omnipress Inc. 2600 Anderson Street Madison, WI 53704 c 2006

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,

More information

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio SCSUG Student Symposium 2016 Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio Praneth Guggilla, Tejaswi Jha, Goutam Chakraborty, Oklahoma State

More information

Word Segmentation of Off-line Handwritten Documents

Word Segmentation of Off-line Handwritten Documents Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Extracting Verb Expressions Implying Negative Opinions

Extracting Verb Expressions Implying Negative Opinions Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Extracting Verb Expressions Implying Negative Opinions Huayi Li, Arjun Mukherjee, Jianfeng Si, Bing Liu Department of Computer

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Switchboard Language Model Improvement with Conversational Data from Gigaword

Switchboard Language Model Improvement with Conversational Data from Gigaword Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword

More information

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS

More information

Writing a Basic Assessment Report. CUNY Office of Undergraduate Studies

Writing a Basic Assessment Report. CUNY Office of Undergraduate Studies Writing a Basic Assessment Report What is a Basic Assessment Report? A basic assessment report is useful when assessing selected Common Core SLOs across a set of single courses A basic assessment report

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

Artificial Neural Networks written examination

Artificial Neural Networks written examination 1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14

More information

Disambiguation of Thai Personal Name from Online News Articles

Disambiguation of Thai Personal Name from Online News Articles Disambiguation of Thai Personal Name from Online News Articles Phaisarn Sutheebanjard Graduate School of Information Technology Siam University Bangkok, Thailand mr.phaisarn@gmail.com Abstract Since online

More information

Cross-Lingual Text Categorization

Cross-Lingual Text Categorization Cross-Lingual Text Categorization Nuria Bel 1, Cornelis H.A. Koster 2, and Marta Villegas 1 1 Grup d Investigació en Lingüística Computacional Universitat de Barcelona, 028 - Barcelona, Spain. {nuria,tona}@gilc.ub.es

More information

Online Updating of Word Representations for Part-of-Speech Tagging

Online Updating of Word Representations for Part-of-Speech Tagging Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

A Vector Space Approach for Aspect-Based Sentiment Analysis

A Vector Space Approach for Aspect-Based Sentiment Analysis A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Movie Review Mining and Summarization

Movie Review Mining and Summarization Movie Review Mining and Summarization Li Zhuang Microsoft Research Asia Department of Computer Science and Technology, Tsinghua University Beijing, P.R.China f-lzhuang@hotmail.com Feng Jing Microsoft Research

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,

More information

The Role of the Head in the Interpretation of English Deverbal Compounds

The Role of the Head in the Interpretation of English Deverbal Compounds The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt

More information

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010

More information

Time series prediction

Time series prediction Chapter 13 Time series prediction Amaury Lendasse, Timo Honkela, Federico Pouzols, Antti Sorjamaa, Yoan Miche, Qi Yu, Eric Severin, Mark van Heeswijk, Erkki Oja, Francesco Corona, Elia Liitiäinen, Zhanxing

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

A heuristic framework for pivot-based bilingual dictionary induction

A heuristic framework for pivot-based bilingual dictionary induction 2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,

More information

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing Jan C. Scholtes Tim H.W. van Cann University of Maastricht, Department of Knowledge Engineering.

More information

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features Sriram Venkatapathy Language Technologies Research Centre, International Institute of Information Technology

More information

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),

More information

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek Vasileios Athanasiou and Manolis Maragoudakis * Artificial

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Text-mining the Estonian National Electronic Health Record

Text-mining the Estonian National Electronic Health Record Text-mining the Estonian National Electronic Health Record Raul Sirel rsirel@ut.ee 13.11.2015 Outline Electronic Health Records & Text Mining De-identifying the Texts Resolving the Abbreviations Terminology

More information

Truth Inference in Crowdsourcing: Is the Problem Solved?

Truth Inference in Crowdsourcing: Is the Problem Solved? Truth Inference in Crowdsourcing: Is the Problem Solved? Yudian Zheng, Guoliang Li #, Yuanbing Li #, Caihua Shan, Reynold Cheng # Department of Computer Science, Tsinghua University Department of Computer

More information

Multivariate k-nearest Neighbor Regression for Time Series data -

Multivariate k-nearest Neighbor Regression for Time Series data - Multivariate k-nearest Neighbor Regression for Time Series data - a novel Algorithm for Forecasting UK Electricity Demand ISF 2013, Seoul, Korea Fahad H. Al-Qahtani Dr. Sven F. Crone Management Science,

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

Matching Similarity for Keyword-Based Clustering

Matching Similarity for Keyword-Based Clustering Matching Similarity for Keyword-Based Clustering Mohammad Rezaei and Pasi Fränti University of Eastern Finland {rezaei,franti}@cs.uef.fi Abstract. Semantic clustering of objects such as documents, web

More information

arxiv: v1 [cs.lg] 3 May 2013

arxiv: v1 [cs.lg] 3 May 2013 Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1

More information

arxiv: v1 [cs.lg] 15 Jun 2015

arxiv: v1 [cs.lg] 15 Jun 2015 Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011 Cristian-Alexandru Drăgușanu, Marina Cufliuc, Adrian Iftene UAIC: Faculty of Computer Science, Alexandru Ioan Cuza University,

More information

Genre classification on German novels

Genre classification on German novels Genre classification on German novels Lena Hettinger, Martin Becker, Isabella Reger, Fotis Jannidis and Andreas Hotho Data Mining and Information Retrieval Group, University of Würzburg Email: {hettinger,

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Exposé for a Master s Thesis

Exposé for a Master s Thesis Exposé for a Master s Thesis Stefan Selent January 21, 2017 Working Title: TF Relation Mining: An Active Learning Approach Introduction The amount of scientific literature is ever increasing. Especially

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,

More information

Emotions from text: machine learning for text-based emotion prediction

Emotions from text: machine learning for text-based emotion prediction Emotions from text: machine learning for text-based emotion prediction Cecilia Ovesdotter Alm Dept. of Linguistics UIUC Illinois, USA ebbaalm@uiuc.edu Dan Roth Dept. of Computer Science UIUC Illinois,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH ISSN: 0976-3104 Danti and Bhushan. ARTICLE OPEN ACCESS CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH Ajit Danti 1 and SN Bharath Bhushan 2* 1 Department

More information

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas Exploiting Distance Learning Methods and Multimediaenhanced instructional content to support IT Curricula in Greek Technological Educational Institutes P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou,

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA Alta de Waal, Jacobus Venter and Etienne Barnard Abstract Most actionable evidence is identified during the analysis phase of digital forensic investigations.

More information