The sentiment features of MD&As and financial misstatement prediction

Similar documents
Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Australian Journal of Basic and Applied Sciences

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Reducing Features to Improve Bug Prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Rule Learning With Negation: Issues Regarding Effectiveness

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Case Study: News Classification Based on Term Frequency

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Speech Emotion Recognition Using Support Vector Machine

Generative models and adversarial training

Linking Task: Identifying authors and book titles in verbose queries

Rule Learning with Negation: Issues Regarding Effectiveness

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Postprint.

Human Emotion Recognition From Speech

Mining Association Rules in Student s Assessment Data

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Level 1 Mathematics and Statistics, 2015

Lecture 1: Machine Learning Basics

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

UNA PROFESSIONAL ACCOUNTING PREP PROGRAM

Word Segmentation of Off-line Handwritten Documents

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

School of Innovative Technologies and Engineering

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Universiteit Leiden ICT in Business

arxiv: v1 [cs.cl] 2 Apr 2017

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

A Vector Space Approach for Aspect-Based Sentiment Analysis

Learning From the Past with Experiment Databases

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Unit 7 Data analysis and design

Degree Qualification Profiles Intellectual Skills

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Computerized Adaptive Psychological Testing A Personalisation Perspective

A heuristic framework for pivot-based bilingual dictionary induction

Humboldt-Universität zu Berlin

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CS Machine Learning

Personal Tutoring at Staffordshire University

What is this place? Inferring place categories through user patterns identification in geo-tagged tweets

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Axiom 2013 Team Description Paper

Software Maintenance

Lecture 1: Basic Concepts of Machine Learning

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Detecting English-French Cognates Using Orthographic Edit Distance

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Data-Based Decision Making: Academic and Behavioral Applications

Universidade do Minho Escola de Engenharia

Cross Language Information Retrieval

Using EEG to Improve Massive Open Online Courses Feedback Interaction

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Presentation Advice for your Professional Review

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

A Review: Speech Recognition with Deep Learning Methods

EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Multilingual Sentiment and Subjectivity Analysis

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Text-mining the Estonian National Electronic Health Record

ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

South Carolina College- and Career-Ready Standards for Mathematics. Standards Unpacking Documents Grade 5

Improving Machine Learning Input for Automatic Document Classification with Natural Language Processing

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

arxiv: v1 [cs.lg] 15 Jun 2015

Volunteer State Community College Strategic Plan,

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Audit Documentation. This redrafted SSA 230 supersedes the SSA of the same title in April 2008.

An application of student learner profiling: comparison of students in different degree programs

The stages of event extraction

Large Kindergarten Centers Icons

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

Laboratorio di Intelligenza Artificiale e Robotica

Ensemble Technique Utilization for Indonesian Dependency Parser

TextGraphs: Graph-based algorithms for Natural Language Processing

CS 446: Machine Learning

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Transcription:

The sentiment features of MD&As and financial misstatement prediction A comparison of deep learning and text mining approach for textual analysis Ting Sun, Yue Liu, Miklos A. Vasarhelyi Presented by Ting Sun Rutgers Business School

Motivations Deep learning is able to effectively and automatically extract features from data, especially the unstructured or semistructured data such as videos, audios, and text. It has achieved great success in speech recognition, object(face) recognition, and textual analysis. With deep learning approach, the sentiment features of the text can be extracted without human intervention Few prior literature has applied deep learning based textual analysis approach to auditing

Objective Demonstrate that deep learning technology can be applied to analyze finance-related text document to obtain the sentiment feature, which is an additional attribute to support audit judgement Provide evidence for the effectiveness of the sentiment features obtained by deep learning by comparing its prediction power to that of the sentiment features obtained by bag of words.

Research Questions (1) Does the sentiment feature of 10-K MD&As extracted by deep learning approach provide essential information for financial misstatement prediction? (2) How effective does the deep learning approach perform as compared to bag of words approach in terms of prediction accuracy?

What we did We analyzed 30,239 MD&As of 10-K filings for fiscal years from 2006 to 2015 using deep learning and bag of words approach and obtained two sets of sentiment scores, Sentiment_DL and Sentiment_TM, respectively. Utilizing CHAID (CHI-square Adjusted Interaction Detection) algorithm, we established two classification models and compared their predictive performance. The results showed that both model 1 and model 2 performed better than previous prediction models for the financial misstatement. The sentiment feature extracted by Deep Learning approach generally performed as effectively as that obtained by bag of words approach.

Financial misstatement prediction Distinguish financial misstatement (FM) from fraud: FM: annual reports which contain misstatement and have been restated. Fraud: An accounting misstatement is fraudulent if committed with intention. FM can be seen as a superset of fraud. It is harder to predict than fraud Prior literature for FM prediction The misstatement literature, specifically those related to prediction with Machine Learning algorithms is limited as compared to fraud. There is even less research involving content features of text (like sentiment): Cecchini, 2005; Larcker and Zakolyukina, 2012 the sample size is relative small and the predictive performance is modest larcker and Zakolyukina, 2012: best AUC=0.597, total sample size=17,150 Cecchini, 2005: accuracy=55.84%,total sample size=800

Sentiment analysis approaches Description of the technique Deep learning approach Emerging technique employing deep hierarchical neural network and trained with a large amount of text files Bag of words approach Prevalent technique using various predefined word lists, with each one representing a particular sentiment feature Rationale understand the meaning of a text file count the frequency of the words originated from a specific dictionary Output sentiment feature Sentiment scores: Sentiment_DL sentiment scores: Sentiment_TX Is there prior literature in accounting and auditing domain No Yes Tool Alchemy language API Loughran and McDonald (2011) Is it a finance-specific tool Required text document Does it need data preprocessing No HTML/text document and webpage No Yes HTML/text document Yes

Sample

Distribution of misstatements over fiscal years

Sentiment scores Obs. Min 25% percentile Median 75% percentile Max Sentiment_DL 30239 -.5606 -.0289.0170.0658.7487 Sentiment_TM 30239 -.0721 -.0105 -.0062 -.0024.0307

Classification models Dependent variable Model 1 Model 2 Misstatement Misstatement Independent variables Sentiment measures Other predictors following prior research SENTIMENT_TM 35 variables related to misstatement SENTIMENT_DL 35 variables related to misstatement

Prediction results of testing data Model 1 Model 2 Accuracy 64.23% 65.7% Type 1 error rate 35.54% 33.32% Type 2 error rate 37.24% 40.66% Precision 0.2139 0.2168 Sensitivity 0.6276 0.5934 specificity 0.6446 0.6668 F1 score 0.3191 0.3176 AUC 0.68 0.68

Conclusions The results show that (1) the sentiment features generated by both approaches exhibit relatively high predictive accuracy in the two prediction models as compared with prior literature of similar sample size; (2) With deep learning approach, we are less likely to have type one errors (3) With bag of words approach, we are less likely to have type two errors. Possible reason is that it is a finance-specific approach. (4) Generally speaking, deep learning approach performs as effectively as bag of words approach