Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network

Similar documents
Lecture 1: Machine Learning Basics

Probabilistic Latent Semantic Analysis

Python Machine Learning

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Assignment 1: Predicting Amazon Review Ratings

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Case Study: News Classification Based on Term Frequency

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Reducing Features to Improve Bug Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning With Negation: Issues Regarding Effectiveness

arxiv: v1 [cs.lg] 15 Jun 2015

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

(Sub)Gradient Descent

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Artificial Neural Networks written examination

Calibration of Confidence Measures in Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Word Segmentation of Off-line Handwritten Documents

Axiom 2013 Team Description Paper

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A study of speaker adaptation for DNN-based speech synthesis

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Generative models and adversarial training

Learning Methods for Fuzzy Systems

Rule Learning with Negation: Issues Regarding Effectiveness

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Test Effort Estimation Using Neural Network

Truth Inference in Crowdsourcing: Is the Problem Solved?

Learning From the Past with Experiment Databases

SARDNET: A Self-Organizing Feature Map for Sequences

CS Machine Learning

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

AQUA: An Ontology-Driven Question Answering System

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Softprop: Softmax Neural Network Backpropagation Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Linking Task: Identifying authors and book titles in verbose queries

Attributed Social Network Embedding

INPE São José dos Campos

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

A Case-Based Approach To Imitation Learning in Robotic Agents

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Multilingual Sentiment and Subjectivity Analysis

Data Fusion Through Statistical Matching

Learning Methods in Multilingual Speech Recognition

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Modeling function word errors in DNN-HMM based LVCSR systems

A Bayesian Learning Approach to Concept-Based Document Classification

Lecture 1: Basic Concepts of Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Cross-Lingual Text Categorization

Model Ensemble for Click Prediction in Bing Search Ads

Automatic document classification of biological literature

Speech Recognition at ICSI: Broadcast News and beyond

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

CSL465/603 - Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Genre classification on German novels

Matching Similarity for Keyword-Based Clustering

arxiv: v1 [cs.lg] 3 May 2013

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Beyond the Pipeline: Discrete Optimization in NLP

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS 446: Machine Learning

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Multi-Lingual Text Leveling

Australian Journal of Basic and Applied Sciences

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Applications of data mining algorithms to analysis of medical data

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

arxiv: v2 [cs.ir] 22 Aug 2016

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Issues in the Mining of Heart Failure Datasets

A Comparison of Two Text Representations for Sentiment Analysis

An Online Handwriting Recognition System For Turkish

Evolutive Neural Net Fuzzy Filtering: Basic Description

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Review: Speech Recognition with Deep Learning Methods

Speech Emotion Recognition Using Support Vector Machine

Diverse Concept-Level Features for Multi-Object Classification

Summarizing Answers in Non-Factoid Community Question-Answering

Conversational Framework for Web Search and Recommendations

Transcription:

Classification of News Articles Using Named Entities with Named Entity Recognition by Neural Network Nick Latourette and Hugh Cunningham 1. Introduction Our paper investigates the use of named entities as features for the classification of news articles by topic. Prior work with The McClatchy Company, and in particular The Sacramento Bee, informs us that categorization of news articles poses a significant challenge to newspapers interested in determining the interests of their users. Given the large volume of news articles produced (past, present, and future), an automated approach to this work is desirable. We implement a system for article classification using the named entities contained within the article as the basis for classification. Many algorithms for text classification, naive Bayes classification, for instance, use the words within the text as features for classification. For highly similar categories or topics, like current coverage of civil strife in Syria and Egypt, the words used by articles in either category may be very similar (e.g., words like violence or military may be equally likely to appear in an article about Egypt as they are likely to appear in an article about Syria). We hypothesize that, for such closely-related topics, using named entities for classification will perform better than other approaches. Intuitively, if the named entity Assad appears within an article then it is very likely that that article should be classified as belonging to the category Syria. We use a set of 200 articles created by TIME as our data set: 100 articles categorized as covering either Syria or Egypt. The selection of TIME as a source is based only on the convenient access to a large number of pre-categorized articles there. Based on the relative similarity of the topics of civil strife in Syria and Egypt we select these categories as an appropriate sample for our investigation. Before utilizing named entities as features for classification, we need to address the non-trivial problem of Named Entity Recognition (NER). We chose to implement our own neural network approach because of its proven success with respect natural language classification problems, its easy parallelizability, and because we were interested in implementing a deep learning algorithm. Note that we borrowed the implementation details of the neural network from PA4 of CS224n. After identifying named entities in the sample, we train a naive Bayes classifier.

2. Prior Work Y.C. Gui et al. provide some precedent for the use of named entities in the classification of news articles [1]. Their research is on the use of named entities for classifying news articles within categories hierarchically. Of particular interest is their focus on what they refer to as Close Categories : highly similar categories within the same hierarchy, e.g., presidential elections in different countries. Gui et al. do not describe the techniques used for NER or extraction of named entities, but found that an SVM trained on named entities outperformed one trained on terms, where terms refers to words or phrases within the articles. Beyond this particular utilization of named entities for classification by Gui et al., there is extensive work on the general problem of Named Entity Recognition as well as on the problem of text classification[2][3]. 3. Neural Network Knowing that the neural network would be the most challenging part of our project in terms of implementation, we decided to start with a single hidden layer and move on from there. Below is an illustration of our neural network, note that W and U are matrices representing linear transformations that are performed on the inputs to the hidden layer and the final classification respectively. In addition to the linear transformations, each node in the hidden layer will also perform a non linear transformation to its input. Note that if we were to not do this, the whole neural network would just be performing one big linear transformation on the data, which would be considerably less powerful in terms of classifications it could represent.

To represent our words as vectors we will use a dictionary of approximately 100,000 words, where each word is represented by a 50 dimensional vector. Each dimension is a weighted feature trained by an unsupervised learning method that tries to capture a words syntactic and semantic information along with the the context in which the word is normally used[4]. Lets call this dictionary L. To decide whether a word is referring to a person or not we, use the words vector representation from dictionary L, along with the vector representations of the c-1 of the words surrounding that word as input to our neural network. To train our model we used stochastic gradient descent to try and minimize our cost function J, we used the following equations: J = m [ 1 m i = 1 [ y (i) * log(h i) (1 y (i) ) * log(1 h i )]] + R a = tanh( W x ), h = sigmoid( U a ) [ m i = 1 du = m 1 a i * (h ) T i y i ] + ( R m) * U [ 1 * ( a 2 1 2 * ( a 2 2 3 * ( a 2 3 ] T z = U 1 ), U 1 ), U 1 ),... [ m 1 m i=1 z i * x i * ( i y i ] + ( R m) * M dw = h ) (i) = (h ) 1 ) dx i y i H U k * ( a 2 k * W k,j j k=1 2m [ 2] nc H W k,j 2 + H k j=1 k=1 k=1u Training and deciding our parameters for the neural network Our data set to train and test the neural network consisted of blocks of text where each word in the text is labeled with a 0 or a 1, a one indicating that the word pertains to a person and a zero indicating that the word does not. In total our blocks of text had 200,000 labeled words in them. In order to properly train our model and tune the parameters, which consisted of a learning rate, context size, hidden layer size, the number of iterations of the gradient descent algorithm, and our choice of the regularization constant R, we decided to implement k-fold cross validation, where we chose k to be 20. From the models we tested we chose the best average F1 score whose runtime was below a certain threshold. Our resulting choice was as follows: Context Size = 5 Learning Rate = 0.001 Iterations = 5 Hidden Network Size = 50 Regularization Constant = 0.001 Average Precision of Model = 0.8200 Average Recall of Model = 0.6205 Average 51 Score of Model = 0.7064

4. Classification We implement a naive Bayes classifier, limiting the size of the vocabulary to consist only of the named entities identified by the neural network. The performance of this classifier is compared to a typical naive Bayes classifier whose vocabulary consists of all words encountered in the sample articles (common words like a, as, etc. excluded). We partition our data randomly into five folds of forty articles each for cross-validation and use each fold as the validation set once for both the named entity naive Bayes classifier and the baseline naive Bayes classifier. The baseline naive Bayes classifier correctly identified the category of the articles in the validation subset with an average accuracy of 73%. Our own naive Bayes classifier using a vocabulary consisting of named entities performed worse in correctly categorizing articles in the validation set with an average accuracy of only 67%. 5. Conclusion and Future Work Currently we are using named entity mentions rather naively. If many articles of a topic X mention person Y in our training set, then we are more likely to classify any article that mentions Y as belonging to topic X. However, if our training set does not have any mentions of person Y, then identifying that an article mentions Y is useless during our classification. Several features of our data set may have also contributed to poor performance of our classifier: given a U.S.-centric perspective in many articles, the entities Barack Obama and John Kerry appear in many articles from both categories so that these entities hinder rather than aid in correct classification. Additionally, many articles contain named entities not mentioned in any other articles in the data set, such as persons interviewed by the author, and the identification of these entities does not aid in classification at all. The performance of our classifier could be improved by disambiguating named entities to associate them with their real world identities. This would allow us to discard entities not directly associated with either category. Our classification could also be improved by a more robust means of named entity recognition that would identify organizations of nations as named entities rather than only people.

6. References [1] Gui, Yaocheng, et al. "Hierarchical Text Classification for News Articles Based-on Named Entities." Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2012. 318-329. [2] McCallum, Andrew, and Kamal Nigam. "A comparison of event models for naive bayes text classification." AAAI-98 workshop on learning for text categorization. Vol. 752. 1998. [3] Nadeau, David, and Satoshi Sekine. "A survey of named entity recognition and classification." Lingvisticae Investigationes 30.1 (2007): 3-26. [4] Huang, Eric H., et al. "Improving word representations via global context and multiple word prototypes." Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012.