Comparison of Neural Network Architectures for Sentiment Analysis of Russian Tweets

Similar documents
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Python Machine Learning

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

arxiv: v4 [cs.cl] 28 Mar 2016

Georgetown University at TREC 2017 Dynamic Domain Track

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Deep Neural Network Language Models

Model Ensemble for Click Prediction in Bing Search Ads

Probabilistic Latent Semantic Analysis

arxiv: v1 [cs.lg] 15 Jun 2015

Second Exam: Natural Language Parsing with Neural Networks

Word Segmentation of Off-line Handwritten Documents

Residual Stacking of RNNs for Neural Machine Translation

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v1 [cs.cv] 10 May 2017

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v2 [cs.ir] 22 Aug 2016

Assignment 1: Predicting Amazon Review Ratings

Summarizing Answers in Non-Factoid Community Question-Answering

arxiv: v2 [cs.cl] 26 Mar 2015

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

There are some definitions for what Word

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Lecture 1: Machine Learning Basics

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v2 [cs.cv] 30 Mar 2017

Attributed Social Network Embedding

arxiv: v1 [cs.cl] 2 Apr 2017

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A study of speaker adaptation for DNN-based speech synthesis

Cultivating DNN Diversity for Large Scale Video Labelling

Human Emotion Recognition From Speech

arxiv: v1 [cs.cl] 20 Jul 2015

Forget catastrophic forgetting: AI that learns after deployment

A Vector Space Approach for Aspect-Based Sentiment Analysis

A deep architecture for non-projective dependency parsing

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v1 [cs.cl] 27 Apr 2016

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v3 [cs.cl] 7 Feb 2017

Temporal Information Extraction for Question Answering Using Syntactic Dependencies in an LSTM-based Architecture

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Knowledge Transfer in Deep Convolutional Neural Nets

Learning From the Past with Experiment Databases

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

ON THE USE OF WORD EMBEDDINGS ALONE TO

Semantic and Context-aware Linguistic Model for Bias Detection

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

SARDNET: A Self-Organizing Feature Map for Sequences

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Speech Recognition at ICSI: Broadcast News and beyond

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Modeling function word errors in DNN-HMM based LVCSR systems

Dialog-based Language Learning

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

Reducing Features to Improve Bug Prediction

Calibration of Confidence Measures in Speech Recognition

THE world surrounding us involves multiple modalities

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Diverse Concept-Level Features for Multi-Object Classification

Artificial Neural Networks written examination

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Modeling function word errors in DNN-HMM based LVCSR systems

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Axiom 2013 Team Description Paper

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Linking Task: Identifying authors and book titles in verbose queries

Prediction of Maximal Projection for Semantic Role Labeling

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

arxiv: v5 [cs.ai] 18 Aug 2015

Learning Methods for Fuzzy Systems

Speech Emotion Recognition Using Support Vector Machine

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Autoencoder and selectional preference Aki-Juhani Kyröläinen, Juhani Luotolahti, Filip Ginter

arxiv: v1 [cs.lg] 3 May 2013

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

CS 446: Machine Learning

Generative models and adversarial training

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Rule Learning With Negation: Issues Regarding Effectiveness

Transcription:

Comparison of Neural Network Architectures for Sentiment Analysis of Russian Tweets Speaker: Konstantin Arkhipenko 1,2 (arkhipenko@ispras.ru) Ilya Kozlov 1,3 Julia Trofimovich 1 Kirill Skorniakov 1,3 Andrey Gomzin 1,2 Denis Turdakov 1,2,4 1 Institute for System Programming of RAS, Moscow, Russia 2 Lomonosov Moscow State University, CMC faculty, Moscow, Russia 3 MIPT, Dolgoprudny, Russia 4 FCS NRU HSE, Moscow, Russia June 2, 2016

Contents 1 SentiRuEval-2016 task overview The task The data The metrics 2 Why neural networks? Word embeddings Baseline: SVM + domain adaptation CNN-based solution RNN-based solution Evaluation results 3 Conclusion and future work

SentiRuEval-2016 task overview Contents 1 SentiRuEval-2016 task overview The task The data The metrics 2 Why neural networks? Word embeddings Baseline: SVM + domain adaptation CNN-based solution RNN-based solution Evaluation results 3 Conclusion and future work

SentiRuEval-2016 task overview The task SentiRuEval-2016: The task Object-oriented sentiment analysis of Russian tweets Given a tweet t i and set O ti O of objects mentioned in t i, for each o O ti : mark t i as negative, neutral or positive towards o

SentiRuEval-2016 task overview The data SentiRuEval-2016: The data Two domains: banks and telecommunication companies Train: Banks: 9392 tweets TC: 8643 tweets Test: Banks: 19586 tweets (3313 of them used for evaluation) TC: 19673 tweets (2247 of them used for evaluation) Imbalanced: 65% of tweets in train data are neutral

SentiRuEval-2016 task overview The metrics SentiRuEval-2016: The metrics Precision: Recall: F1-score: precision = recall = truepositivemarks truepositivemarks + falsepositivemarks truepositivemarks truepositivemarks + falsenegativemarks f 1 = 2 precision recall precision + recall F1-score, macro-averaged over negative and positive classes; used for evaluation: f 1Macro = 0.5 (f 1 negative + f 1 positive )

Contents 1 SentiRuEval-2016 task overview The task The data The metrics 2 Why neural networks? Word embeddings Baseline: SVM + domain adaptation CNN-based solution RNN-based solution Evaluation results 3 Conclusion and future work

We focused on determining overall sentiment of the whole tweets Given a tweet t i and set O ti O of objects mentioned in t i, determine sentiment s ti {negative, neutral, positive} of t i and for each o O ti : mark t i as s ti towards o

Why neural networks? Why neural networks? Modern NN architectures (e.g. recurrent neural networks) achieve state-of-the-art in many NLP problems, outperforming shallow machine learning approaches Lots of powerful, efficient, easy-to-use deep learning libraries evolved over last few years

Word embeddings Word embeddings: word2vec Introduced by Tomas Mikolov (now at Facebook AI Research) Maps words into vector space Based on simple feed-forward neural network Captures syntactic and semantic regularities Helps to overcome data sparsity in our task

Word embeddings Word embeddings: word2vec We trained word2vec on 3.3 GB of Web users comments from: ВКонтакте (https://vk.com/) Эхо Москвы (http://echo.msk.ru/) Свободная Пресса (http://svpressa.ru/) The following parameters were used: Continuous Bag-of-Words architecture 10 negative samples for every prediction word embeddings dimensionality of 200 5 training iterations over corpus

Baseline: SVM + domain adaptation Baseline: SVM + domain adaptation For every tweet: convert it to sequence of corresponding word2vec word embeddings. Punctuation and words that are not in word2vec vocabulary are discarded form a tweet embedding by averaging vectors in this sequence and feed it into support vector machine (SVM) classifier Domain adaptation: we discovered that source domain (train data) and target domain (test data) are drawn from different probability distrubutions sample reweighting: give higher weights to samples that look like target samples and don t look like source samples

CNN-based solution CNN-based solution For every tweet: form a tweet embedding also form an additional tweet embedding by getting element-wise maximum of all word embeddings in the sequence concatenate these tweet embeddings and feed the result into convolutional neural network (CNN) Convolutional neural network: convolutional layer with 8 kernels of width 10 dense layer: 3 neurons with softmax activation that predict probabilities of each class (negative, neutral and positive) 10 training epochs Yes, this solution is quite silly... (but not all possible CNN-based approaches)

RNN-based solution Recurrent neural networks: Gated Recurrent Unit (GRU) RNNs are suitable for processing sequence data (http://colah.github.io/posts/2015-08-understanding-lstms/)

RNN-based solution RNN-based solution Neural network takes sequence of word2vec embeddings of words in the tweet as an input NN architecture: two GRU cells with input/output dimensionality of 200; dropout is applied to the output of second cell dense layer: 3 neurons with softmax activation that predict probabilities of each class Implemented using Keras library: http://keras.io/ only 200 lines of code 20 training epochs, batch size of 8

Evaluation results Evaluation results: Macro F1-score Solution Banks domain / Rank TC domain / Rank CNN 0.4832 / 21st 0.4704 / 41st GRU 0.5517 / 1st 0.5594 / 1st Ensemble* 0.5352 / 2nd 0.5403 / 9th *combination of CNN, GRU, and Baseline (SVM + domain adaptation) solutions

Conclusion and future work Contents 1 SentiRuEval-2016 task overview The task The data The metrics 2 Why neural networks? Word embeddings Baseline: SVM + domain adaptation CNN-based solution RNN-based solution Evaluation results 3 Conclusion and future work

Conclusion and future work Conclusion and future work Our CNN-based solution is very silly We are not deep learning experts (yet) We had little time for competition We did not use any lexicons and performed very little preprocessing We did not explore hyperparameter values properly However, we have won the competition Next year we are going to improve the results significantly: discover optimal NN architectures and find better hyperparameters use domain adaptation in neural networks...

Conclusion and future work Thank you! Questions?