Spotting Sentiments with Semantic Aware Multilevel Cascaded Analysis

Similar documents
Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A Case Study: News Classification Based on Term Frequency

Lecture 1: Machine Learning Basics

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

A Vector Space Approach for Aspect-Based Sentiment Analysis

Probabilistic Latent Semantic Analysis

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Comparison of Two Text Representations for Sentiment Analysis

Learning From the Past with Experiment Databases

Switchboard Language Model Improvement with Conversational Data from Gigaword

CS 446: Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Rule Learning with Negation: Issues Regarding Effectiveness

Ensemble Technique Utilization for Indonesian Dependency Parser

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Australian Journal of Basic and Applied Sciences

(Sub)Gradient Descent

Reducing Features to Improve Bug Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Human Emotion Recognition From Speech

arxiv: v2 [cs.cl] 26 Mar 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The stages of event extraction

Modeling function word errors in DNN-HMM based LVCSR systems

Multilingual Sentiment and Subjectivity Analysis

Model Ensemble for Click Prediction in Bing Search Ads

Calibration of Confidence Measures in Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Modeling function word errors in DNN-HMM based LVCSR systems

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.cl] 2 Apr 2017

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Detecting English-French Cognates Using Orthographic Edit Distance

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Applications of data mining algorithms to analysis of medical data

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Postprint.

Multi-Lingual Text Leveling

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Generative models and adversarial training

arxiv: v2 [cs.cv] 30 Mar 2017

Beyond the Pipeline: Discrete Optimization in NLP

arxiv: v1 [cs.lg] 3 May 2013

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Multi-label classification via multi-target regression on data streams

Word Segmentation of Off-line Handwritten Documents

Extracting Verb Expressions Implying Negative Opinions

Verbal Behaviors and Persuasiveness in Online Multimedia Content

Using dialogue context to improve parsing performance in dialogue systems

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Bayesian Learning Approach to Concept-Based Document Classification

Artificial Neural Networks written examination

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Universidade do Minho Escola de Engenharia

arxiv: v1 [cs.lg] 15 Jun 2015

Universiteit Leiden ICT in Business

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Comment-based Multi-View Clustering of Web 2.0 Items

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Semi-Supervised Face Detection

A study of speaker adaptation for DNN-based speech synthesis

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

Robust Sense-Based Sentiment Classification

AQUA: An Ontology-Driven Question Answering System

Semantic and Context-aware Linguistic Model for Bias Detection

Georgetown University at TREC 2017 Dynamic Domain Track

A survey of multi-view machine learning

Activity Recognition from Accelerometer Data

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Conference Presentation

Second Exam: Natural Language Parsing with Neural Networks

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

arxiv: v2 [cs.ir] 22 Aug 2016

Online Updating of Word Representations for Part-of-Speech Tagging

On document relevance and lexical cohesion between query terms

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Cross-Lingual Text Categorization

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Transcription:

Spotting Sentiments with Semantic Aware Multilevel Cascaded Analysis Despoina Chatzakou, Nikolaos Passalis, Athena Vakali Aristotle University of Thessaloniki Big Data Analytics and Knowledge Discovery, 2015 Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 1 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 2 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 3 / 47

Content generated on the Web Numerous individuals express opinions and feelings in the Web. Continuous use of popular Social Networks and Web 2.0 technologies has pushed the need for understanding crowd s opinions. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 4 / 47

Capturing sentiment out of textual resources (1/2) Machine learning is a popular approach for spotting the sentiment expressed in documents (i.e. document-based sentiment analysis). Typically, document-based sentiment analysis processes operate at a particular level: Fine-grained approach: word-level process (i.e. sentiment-based, syntactic-based, semantic-based). Coarse-grained approach: sentence-level process. Sentiments extraction only either out of separated sets of words or at lined sentences leads to information loss. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 5 / 47

Capturing sentiment out of textual resources (2/2) Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 6 / 47

Our goals Given: a set of documents D; a sentiment label and a representation for each document; Predict: the expressed sentiment for any new document. G1. Exploit effectively diverse information from each individual sentence of a document. G2. Design an effective approach for combining information arising from different text-levels. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 7 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 8 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 9 / 47

Recursive Neural Tensor Network RNTN: suitable for capturing the compositional effects in sentences. It learns a semantic vector space & generates a sparse tree to represent a document at different levels. Each sentence is represented with a semantic information vector. It can classify individual sentences and produce a sentiment probability distribution vector. In our case: A 5-value sentiment probability distribution vector is produced (1 - very negative, 2 - negative, 3 - neutral, 4 - positive, 5 - very positive). Sentiment probability distribution: sent i (s), where i = 1,..., 5 and s a sentence of a document. Semantic space vector: vec(s). Go Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 10 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 11 / 47

Dictionary learning (1/2) Use of a clustering process to construct a dictionary. Clustering is applied on the feature vectors that represent the documents sentences. Each word of the dictionary corresponds to a set of similar feature vectors. K-means and variants are usually used to perform the clustering. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 12 / 47

Dictionary learning (2/2) Encoding of a new document. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 13 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 14 / 47

Problem definition Multilevel sentiment analysis Given: a training dataset D, a word-level feature extractor f (w), a sentence-level feature extractor g(s), and a sentiment label for each document t i {pos, neg}; Extract: the word level and the sentence level features of each document d. Predict: the sentiment of any new document d test / D. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 16 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 17 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 18 / 47

Word level analysis (1/3) Step 1. Features extraction: word to vector mapping. Step 2. The document-level vectors aggregation via a weighting scheme (e.g. term-frequency tf ). Step 1. Features extraction: word to vector mapping. A word w is modeled as a vector v R k, where k equals the size of the used dictionary. All the elements of v are zero except for the one that corresponds to the word w. Word-level features. Both Bag of Words (BoW) and Naive Bayes bigrams (NB) features were examined. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 19 / 47

Word level analysis (2/3) Step 2. Word vectors aggregation. All the vectors of the words in a document are combined into one that describes the whole document. Word-level feature extractor The word-level feature extractor f maps each word w of a document d to a vector f (w) R l (l is the dimensionality of the (output) vector). Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 20 / 47

Word level analysis (3/3) Aggregation Example (BoW features). Dictionary: {bag, of, words}. Word Vector bag (1,0,0) of (0,1,0) words (0,0,1) Binary weighting scheme. Vector of the phrase bag bag words: (1,0,1). Term-frequency weighting scheme. Vector of the phrase bag bag words: (2,0,1). Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 21 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 22 / 47

Sentence level analysis (1/6) Step 1. Features extraction. Step 2. Aggregation phase under a weighting scheme. Step 1. Features extraction. Use of the RNTN model as a sentence-level feature extractor. Sentiment distribution vector, sent(s). Semantic vector, vec(s). Other methods could also be used for extracting the sentiment distribution and the semantic vector for each sentence. RNTN was the first to achieve 85.4% accuracy on binary sentence-level classification. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 23 / 47

Sentence level analysis (2/6) Step 2. Aggregation phase. Two approaches were examined: Sentiment center estimation. Semantic center estimation. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 24 / 47

Sentence level analysis: Sentiment center estimation (3/6) Sentiment center vector. sent center (d) = sent(s)/ d s d where d is the number of sentences of document d. Sentiment variance vector. sent var (d) = s d(sent(s) sent center (d)) 2 / d which contains the squared differences from the document s center. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 25 / 47

Sentence level analysis: Semantic center estimation (4/6) Semantic center vector. vec center (d) = vec(s)/ d s d where d is the number of sentences of document d. Semantic variance vector. vec var (d) = s d(vec(s) vec center (d)) 2 / d which contains the squared differences from the document s center. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 26 / 47

Sentence level analysis: Semantic CenterBook (5/6) Builds on vectors that merge semantically similar sentences. CenterBook process. Given a training set of documents D = {d 1, d 2,..., d n } Do Split all documents into a set of sentences. Clustering the set of all sentences appearing in D. Done Output: a collection of clusters. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 27 / 47

Sentence level analysis: Semantic CenterBook (6/6) Each sentence in a document is represented by its nearest cluster. The overall document is modeled by the set of centroids. Sentence encoding function, h(s). { 1, i == arg j min( c j vec(s) 2 2 h(s) = y i = ) 0, otherwise where y i is the i-th element of y(s) vector. Document representation, CenterBook. code(d) = s S d h(s) where s is each sentence of a document d and S d the set of all sentences of document d. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 28 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 29 / 47

MultiSpot pipeline (1/2) Given a document d Do Done Phase 1. Word level analysis (Fine-grained word features). Phase 2. Sentence level analysis (Coarse-grained word features). Phase 3. Combination of word and sentence level aggregated features. Spot the document s sentiment. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 30 / 47

MultiSpot pipeline (2/2) Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 31 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 32 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 33 / 47

Dataset overview Both datasets contain movie reviews collected from the Internet Movie Database (IMDB). Dataset # Reviews Pos / Neg Large Movie Review Dataset (IMDB) 50k + 50k (unlab) 50% - 50% Polarity dataset v2.0 (RT-2k) 2.000 50% - 50% Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 34 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 35 / 47

Fundamental Characteristics Extracted word-level features. BoW (top 10.000 unigrams) & NB bigrams. Weighting scheme: Term-frequency for the IMBD dataset, Binary weighting for the RT-2k. Extracted sentence-level features: Sentiment distribution & Semantic vector based on RNTN model. Clustering: k-means algorithm for 15 iterations, 10 times repetition of the clustering process and selection of the minimum energy configuration. Classification: linear SVM, selection of best SVM model based on 10-fold cross validation. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 36 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 37 / 47

CenterBook evaluation How is the classification quality affected by: Q1. the number of the clusters? Q2. the size of available training data? 85.2 85 Smoothed accuracy/recall/specificity curve accuracy recall (TPR) specificity (TNR) 84.75 84.7 Smoothed accuracy curve 84.8 84.65 84.6 84.6 % 84.4 84.2 84 accuracy 84.55 84.5 83.8 84.45 83.6 84.4 83.4 83.2 0 100 200 300 400 500 600 700 800 900 1000 number of clusters 50 clusters 84.35 100 clusters 200 clusters 0 0.5 1 1.5 2 2.5 traning dataset size x 10 4 Results for the IMDB dataset. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 38 / 47 Go

Evaluation of sentence-based approaches Parameters. Use of 200 clusters for the IMDB dataset and 100 clusters for the RT-2k dataset. Features IMDB RT-2k Accuracy Recall F1 Accuracy Recall F1 Rule-based 76.21 56.36 70.32 59.7 19.80 32.95 Sentiment Center 84.02 80.97 83.52 83.30 86.10 83.75 Sentiment Center (var) 84.01 80.96 83.51 83.05 86.30 83.58 Semantic Center 84.90 83.45 84.68 85.10 86.80 85.35 Semantic Center (var) 85.27 83.53 85.01 84.90 85.50 84.99 CenterBook 84.76 85.05 84.80 83.15 84.10 83.34 Sentiment Center (var) + Semantic Center 85.27 83.53 85.01 85.05 85.40 85.10 Sentiment Center (var) + Semantic Center + CenterBook 85.35 84.30 85.20 84.85 87.10 85.18 Approaches that involve semantic features yield better classification results. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 39 / 47

Evaluation of the MultiSpot method using BoW and NB bigrams features (1/2) Features IMDB RT-2k Accuracy Recall F1 Accuracy Recall F1 BoW 87.77 88.01 87.80 87.15 88.40 87.31 BoW + Sentiment Center 88.99 89.01 88.99 87.45 88.70 87.60 BoW + Semantic Center (var) 89.36 89.18 89.34 88.20 89.60 88.36 BoW + CenterBook 89.29 89.26 89.29 88.85 90.80 89.06 BoW + Sentiment Center + Semantic Center (var) 89.38 89.22 89.36 88.25 89.70 88.42 BoW + Sentiment Center + Semantic Center (var) + CenterBook 89.48 89.19 89.45 89.05 91.50 89.31 1.71% improvement for the IMDB dataset and 1.9% improvement for the RT-2k dataset. Features IMDB RT-2k Accuracy Recall F1 Accuracy Recall F1 NB bi 91.43 92.13 91.49 89.45 90.80 89.59 NB bi+ Sentiment Center 91.72 91.77 91.73 90.00 91.30 90.13 NB bi+ Semantic Center (var) 91.76 91.49 91.73 90.90 91.90 90.99 NB bi+ CenterBook 91.72 91.90 91.73 91.30 93.50 91.49 NB bi+ Sentiment Center + Semantic Center (var) 91.78 91.53 91.76 90.85 91.90 90.95 NB bi+ Sentiment Center + Semantic Center (var) + CenterBook 91.60 91.33 91.58 90.65 93.10 90.87 0.35% improvement for the IMDB dataset and 1.85% improvement for the RT-2k dataset. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 40 / 47

Evaluation of the MultiSpot method using BoW and NB bigrams features (2/2) Observations. The combination of word & sentence level features improves the classification accuracy. The quality of the word-level features significantly affects the overall classification accuracy. Friedman test. It is used to explore differences in treatments across multiple test attempts. Null hypothesis: Multilevel cascaded sentiment analysis does not increase the accuracy of the baseline (BoW / NB bigrams) classifier Rejected. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 41 / 47

Comparison of MultiSpot with state-of-the-art approaches Method IMDB RT-2k MultiSpot method NB bi + CenterBook 91.72 91.30 NB bi + Sentiment Center (var) + Semantic Center (var) 91.78 90.85 State-of-the-art approaches Full + Unlabeled + BoW (Maas2011) 88.89 88.90 BoW SVM (Pang2004) - 87.15 tf idf (Martineau2009) - 88.10 Appr. Taxonomy (Whitelaw2005) - 90.20 Word Repr. RBM + BoW (Dahl2012) 89.23 - NB SVM bigrams (Wang2012) 91.22 89.45 Paragraph Vector (Le2014) 92.58 - RT-2k: exceeds the existing classification accuracy by 1.1%. IMDB: surpasses the existing classification accuracy for 0.8% (not combined with the paragraph vector method). Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 42 / 47

Outline 1 Introduction 2 Background RNTN: Recursive Neural Tensor Network Dictionary learning 3 Problem definition 4 Proposed approach Word level analysis Sentence level analysis MultiSpot pipeline 5 Experiments Dataset Fundamental Characteristics Results 6 Conclusions Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 43 / 47

Conclusions G1. Exploit effectively diverse information from each individual sentence of a document. Exploitation of sentiment and/or semantic information via the center-based methodologies. G2. Design an effective approach for combining information arising from different text-levels. MultiSpot is an affective pipeline which combines both word and sentence level information. Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 44 / 47

Questions? Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 45 / 47

Appendix: CenterBook evaluation Evaluation of the k-means objective function. Results for the IMDB dataset. Back Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 46 / 47

Appendix: Tree structure Back Chatzakou, Passalis, Vakali (AUTH) MultiSpot DaWak2015 47 / 47