White Paper. Using Sentiment Analysis for Gaining Actionable Insights
|
|
- Lizbeth Parsons
- 6 years ago
- Views:
Transcription
1 corevalue.net White Paper Using Sentiment Analysis for Gaining Actionable Insights Sentiment analysis is a growing business trend that allows companies to better understand their brand, products, and services by analyzing the attitudes, opinions, and emotions expressed by an online audience. Author Olena Domanska CoreValue Data Science Engineer
2 Using sentiment analysis for gaining actionable insights Consumer opinions undoubtedly affect a company s reputation and should be of a high interest to businesses, as they can prove to be extremely valuable assets. Actionable insights provide businesses an advantage over their competition and help them maintain a competitive edge on the market. Today it s easy for consumers to loudly express their satisfaction and their frustration about a company or a product through social media, forums, blogs, and review platforms which can greatly impact public opinion. Sentimental analysis allows businesses to analyze public opinion about a product or service in order to unlock the hidden value contained within. This information, when used correctly, enables them to make better informed business decisions. The notion of sentiment analysis Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information found in source materials. By harnessing the power of sentiment analysis and wrangling all the opinion-related information it contains, businesses can extract tremendous value and use it to their advantage. This data mining requires significant effort, however, as it involves various product/service comparisons, subjectivity and probability defining, emotional components classification, opinion reasoning, and summarizing. In layman s terms, the sentiment analysis engine lurks in social platforms, processes tons of unrestricted data, and derives actionable insights that are directly related to business results. Core techniques of sentiment analysis There are two fundamental ways in which to approach sentiment analysis: supervised and unsupervised (or lexicon-based). The lexicon-based approach rests upon the assumption that the contextual sentiment orientation of the text can be calculated by summing up the sentiment scores of each separate word or phrase. Essentially, this technique relies on external lexical resources that are concerned with mapping words to a categorical class (positive, negative, neutral) or numerical sentiment score. As a result, its effectiveness strongly depends on the quality and adequacy of the chosen resource. While the obvious advantage of the approach is avoiding the arduous step of labeling training data, one must also be aware of its possible limitations. A few examples include instances when a word associated with a positive or negative sentiment actually has opposite orientations in different application domains or, when a sentence containing sentiment words may not express any sentiment at all (in interrogative and conditional sentences). Sentences with a sarcastic tone often warp the polarity of sentiment words and many sentences without sentiment words can also imply opinions. Supervised techniques, on the other hand, work with the notion of training data. Specifically, training samples and the corresponding output values are entered into the algorithm before applying it to the actual data set. This enables the algorithm to handle new unknown data in the future and provide more accurate sentiment classification in specific domains for which it has been trained. The most common supervised learning methods are Naive Bayes classification and Support Vector Machines (SVM) although researchers apply many others as well, including maximum entropy, random forest, neural networks, and regression tree. Recent works in the area shows that supervised approaches tend to overcome unsupervised ones. But is this really true? In this article, we will try to verify this assumption with real data.
3 Sentiment analysis in action Theory aside, the real questions seem to be, How effective is sentiment analysis in practice? and Which approach is more accurate: supervised or unsupervised? In order to figure this out, we decided to analyze all the available reviews for HubSpot, one of the popular marketing automation platforms, from a natural language processing perspective. The script for the following analysis can be found on GitHub. To perform our analysis, we began by closely examining the data we collected in order to discover the most frequently used words and built associations between them to figure out clusters and themes within the reviews. We then examined how review topics changed over time. Finally, we identified the sentiments of consumer opinions by applying alternatively unsupervised and supervised methods, comparing how each performed on our real data. Exploratory Phase Here is the sample of data we gathered: "HubSpot is our main marketing platform. It's currently used to automate our marketing programs, including marketing, landing pages, social media and our blog. We also use the tool to score leads, and automate our lead nurturing process. It's easy to measure the success of our programs through the reporting. HubSpot is great for automating workflow s, creating new campaigns, landing page creation and compiling lists. They have a great training program for gearing up with HubSpot..." Each review was scored by reviewers on a scale from 1 to 10. Here is a distribution of these scores: As you can see, the distribution is strongly left-skewed with a distinct peak at the highest value. It is interesting to see how algorithms perform on such unbalanced real data. To start off, we determined the most frequently used words by building a word cloud with the help of Wordle. Below, you can find the screenshot that illustrates the results.
4 After examining the word cloud, we concluded that people mainly discussed the features of HubSpot s inbound marketing platform and describe them with the words easy, great and amazing. Even though word clouds give us an understanding of which words are most popular in reviews, they don t allow us to determine numerical proportions of their occurrence frequencies. To do this, we built a so-called document-term matrix that shows which terms contain the review and how often they appear. Adding the number of term occurrences for each review, we get the following histogram:
5 The most frequently used words in the review texts are the words hubspot (2531), market (1418), use (1393), and tool (690 occurrences). But how are these words connected to each other? To answer this, we built the following graph, illustrating the associations between these words. The thicker the line connecting two words, the higher the probability of their cooccurrence in a review: We see that words hubspot, lead, tool, as well as hubspot, can, help ; , content, page are usually present together. At this point, the content of the reviews starts to become more clear. Next, we thought it would be interesting to find out what directions of discussions are hidden in reviews texts. Topics Identification To identify topics, we grouped reviews into clusters using the Hierarchical Clustering technique. The results of which are shown below (we first determined the ascending clustering of reviews before constructing the tree with only a few of the uppermost clusters). It was necessary to pick a threshold level to form the groups, so we decided on the simplest and most popular solution, which is to inspect the dendrogram.
6 Hierarchical Clustering 30 uppermost clusters of reviews In our case, threshold at 0.4 level seemed to be a reasonable choice, which revealed 3 clusters, depicted by black, green, and red boxes containing 73, 12, and 15 percents of reviews, respectively. Thereby revealing that 73 out of 100 reviewers discussed the same topic. But what was the topic they discussed? We determined topics of those clusters based on probabilistic modeling of term frequency. Below are a few of the most frequently used words for each topic: Topic 1: "marketing, hubspot, tool, customer, lead, inbound, sale" Topic 2: "hubspot, , social, page, blog, content, website, manage" Topic 3: "hubspot, time, can, make, help, need, get" On the basis of the word vectors listed as topics, we concluded the theme of the reviews within each cluster. For instance, the first group of the reviews concentrates on the idea that Hubspot is a leading marketing tool for increasing customer sales, while the second group of reviews is devoted to discussing Hubspot tools like social media and blog posts publisher, landing page creator, content management and website visitor tracking. Let s discover the trending topics across the three-year time frame:
7 According to this plot, there were just a few reviews on Hubspot until the middle of 2013, with topic 1 proving to be prevalent for yet another year. The highest concentration of reviews occurs at the beginning of 2015, followed by a slowdown which still continues. We continued our sentiment analysis of the reviews using unsupervised and supervised approaches, comparing their accuracy. Sentiment Analysis We began using the unsupervised (lexicon-based) approach, which estimates a record's sentiment by counting the number of occurrences of "positive" and "negative" words and utilizing Hu and Liu's "opinion lexicon". It categorizes around 6,800 words as positive or negative and is available for download here. Other useful resources for lexicon-based sentiment analysis include the MPQA Subjectivity Lexicon, SentiWordNet, SenticNet. To assign a numeric score to each review, we simply subtract the number of negative words from the number of positive words that occur. A new question arose: Should we take into account the length of the review? Consider the following two reviews. The first is fantastic (one word-long, the sentiment score is equal to one) while the second review is several sheets long, expressing both positive and negative thoughts about the product, but with a total score is also equal to one. Obviously, we should rank these reviews differently. One way to take this peculiarity into account is to normalize the score by the length of the review. Not to be unbound, we calculated scores for HubSpot reviews in both cases: with
8 and without normalization to evaluate the sum of squared errors (SSE). As it turns out, normalization reduced the SSE twice (see the details on GitHub). Based on these arguments, we determined our analysis using three steps: count the sentiment score for each review, normalize the score by the review length, map the obtained scores to the interval [1,10] and round them due to the fact that every review has a rating score assigned by reviewers in the range of 1 to 10. As a result, we obtained 10-class classification problem (NORMAL formulation). However, sentiment classification is usually formulated as a two-class classification problem: positive and negative (BINARY formulation), where a review with the rating score from 1 to 4 is considered to be a negative review, while a review with 5 to 10 rating score is considered to be a positive review. It is also possible to use a neutral class and consider a three-class classification problem, by which a review with 1 to 3 rating score is considered to be negative, 4 to 6 - neutral, and 7 to 10 - positive (BASIC formulation). We obtained the following distributions of the reviews classes for each of the formulated sentiment classification problems: After comparing the distribution of reviews scores assigned by reviewers with the distributions of reviews scores obtained with the help of lexicon-based approach, we concluded that the unsupervised (lexicon-based) technique performs well only in the case of binary classification, where accuracy reaches If you consider that normal and basic formulations have accuracies equal to only and 0.19 respectively, our analysis reconfirms the research provided by other scientists who determined the accuracy of classification algorithms depends significantly on the number of classes considered. Taking into account that, according to a series of experiments with Mechanical Turk, humans only agree 79% of the time, this algorithm gives competitive result in the case of two classes. The quantitative perspective of opinions in this case stated that 68.8% of people were positive about the Hubspot product.
9 Now let s consider the supervised approach. We start with a Naive Bayes classifier. The chosen classifier applies Bayes Theorem to predict the class of the given text using a number of previously classified samples of the same type. We divide our reviews into 2 groups: train and test, in order to evaluate the accuracy of the method on the text set. As in the case of lexicon-based approach, we provide NORMAL, BASIC and BINARY formulations. The accuracy of the method turns out to be 0.63, 0.99, and 1 for normal, basic and binary formulations, respectively. It turns out that compared to unsupervised method, one of the simplest supervised models - Naive Bayes classifier - was able to achieve a recall accuracy up to 100% for our biased data. In the landscape of R, the fantastic RTextTools package was developed by Timothy P. Jurka and colleagues for automatic text classification. The package includes nine algorithms for ensemble classification and is designed to conduct supervised learning in less than 10 steps. For sentiment classification of Hubspot reviews, we chose the following 5 of the existed algorithms: support vector machine (SVM), maximum entropy (MAXENT), random forest (RF), classification or regression tree (TREE) and neural networks (NNET), and implemented them for every formulation: NORMAL, BASIC, and BINARY. All of these methods showed almost 100% accuracy in the case of two or three classes and the accuracy in the interval [0.59, 0.72] in the case of 10 numerical classes. The comparison of the obtained results can be observed in the following plot:
10 Conclusions In this article, we provided a thorough comparison of unsupervised and supervised approaches to sentiment analysis using the example of Hubspot platform reviews. Specifically, we have used and evaluated the results from seven different models, including lexicon-based, Naive Bayes classifier, support vector machine, maximum entropy, random forest, classification, and neural networks algorithms. Our examination shows that when data is skewed, both lexicon-based (unsupervised) and machine learning (applied to supervised scenario) techniques perform very well in terms of accuracy in the case of binary classification. As expected, the machine learning methodologies outperformed the lexicon-based method. Overall, the sentiment analysis proves to be a relatively simple and effective tool to extract valuable opinion-based information from source data. This creates the potential for further growth of sentiment analysis by expanding its usage into new areas, much to the benefit of businesses who harness the power of this method.
11 About CoreValue CoreValue, a Software and Technology Services firm headquartered in New Jersey with Development Labs in Eastern Europe, provides Mobility and traditional Cloud based CRM implementation services, Mobile applications in Pharmaceutical, Medical, Media and Telecommunication verticals. Customers trust CoreValue to provide Infrastructure services utilizing premier staff in Data Science, Data Management, Database Services, Quality Assurance and traditional development. CoreValue Services 18 Overlook Ave, Suite 9 Rochelle Park, NJ info@corevalue.net
Twitter Sentiment Classification on Sanders Data using Hybrid Approach
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationApplications of data mining algorithms to analysis of medical data
Master Thesis Software Engineering Thesis no: MSE-2007:20 August 2007 Applications of data mining algorithms to analysis of medical data Dariusz Matyja School of Engineering Blekinge Institute of Technology
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationSTA 225: Introductory Statistics (CT)
Marshall University College of Science Mathematics Department STA 225: Introductory Statistics (CT) Course catalog description A critical thinking course in applied statistical reasoning covering basic
More informationProbability and Statistics Curriculum Pacing Guide
Unit 1 Terms PS.SPMJ.3 PS.SPMJ.5 Plan and conduct a survey to answer a statistical question. Recognize how the plan addresses sampling technique, randomization, measurement of experimental error and methods
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationSTT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.
STT 231 Test 1 Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point. 1. A professor has kept records on grades that students have earned in his class. If he
More informationCS 446: Machine Learning
CS 446: Machine Learning Introduction to LBJava: a Learning Based Programming Language Writing classifiers Christos Christodoulopoulos Parisa Kordjamshidi Motivation 2 Motivation You still have not learnt
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationPredicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks
Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks Devendra Singh Chaplot, Eunhee Rhim, and Jihie Kim Samsung Electronics Co., Ltd. Seoul, South Korea {dev.chaplot,eunhee.rhim,jihie.kim}@samsung.com
More informationStatewide Framework Document for:
Statewide Framework Document for: 270301 Standards may be added to this document prior to submission, but may not be removed from the framework to meet state credit equivalency requirements. Performance
More informationSwitchboard Language Model Improvement with Conversational Data from Gigaword
Katholieke Universiteit Leuven Faculty of Engineering Master in Artificial Intelligence (MAI) Speech and Language Technology (SLT) Switchboard Language Model Improvement with Conversational Data from Gigaword
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationImpact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees
Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees Mariusz Łapczy ski 1 and Bartłomiej Jefma ski 2 1 The Chair of Market Analysis and Marketing Research,
More informationVisit us at:
White Paper Integrating Six Sigma and Software Testing Process for Removal of Wastage & Optimizing Resource Utilization 24 October 2013 With resources working for extended hours and in a pressurized environment,
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationAlgebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview
Algebra 1, Quarter 3, Unit 3.1 Line of Best Fit Overview Number of instructional days 6 (1 day assessment) (1 day = 45 minutes) Content to be learned Analyze scatter plots and construct the line of best
More informationAnalyzing the Usage of IT in SMEs
IBIMA Publishing Communications of the IBIMA http://www.ibimapublishing.com/journals/cibima/cibima.html Vol. 2010 (2010), Article ID 208609, 10 pages DOI: 10.5171/2010.208609 Analyzing the Usage of IT
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationA Comparison of Two Text Representations for Sentiment Analysis
010 International Conference on Computer Application and System Modeling (ICCASM 010) A Comparison of Two Text Representations for Sentiment Analysis Jianxiong Wang School of Computer Science & Educational
More informationHistorical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach
IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach To cite this
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationGrade 6: Correlated to AGS Basic Math Skills
Grade 6: Correlated to AGS Basic Math Skills Grade 6: Standard 1 Number Sense Students compare and order positive and negative integers, decimals, fractions, and mixed numbers. They find multiples and
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More information1 3-5 = Subtraction - a binary operation
High School StuDEnts ConcEPtions of the Minus Sign Lisa L. Lamb, Jessica Pierson Bishop, and Randolph A. Philipp, Bonnie P Schappelle, Ian Whitacre, and Mindy Lewis - describe their research with students
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Ch 2 Test Remediation Work Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) High temperatures in a certain
More informationSouth Carolina English Language Arts
South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationA Vector Space Approach for Aspect-Based Sentiment Analysis
A Vector Space Approach for Aspect-Based Sentiment Analysis by Abdulaziz Alghunaim B.S., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationAustralian Journal of Basic and Applied Sciences
AENSI Journals Australian Journal of Basic and Applied Sciences ISSN:1991-8178 Journal home page: www.ajbasweb.com Feature Selection Technique Using Principal Component Analysis For Improving Fuzzy C-Mean
More informationScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 368 373 The 6th International Conference on Current and Future Trends of Information and Communication Technologies
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationEXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS
EXAMINING THE DEVELOPMENT OF FIFTH AND SIXTH GRADE STUDENTS EPISTEMIC CONSIDERATIONS OVER TIME THROUGH AN AUTOMATED ANALYSIS OF EMBEDDED ASSESSMENTS Joshua M. Rosenberg and Christina V. Schwarz Michigan
More informationNetpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models
Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models 1 Netpix: A Method of Feature Selection Leading to Accurate Sentiment-Based Classification Models James B.
More informationLarge-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy
Large-Scale Web Page Classification by Sathi T Marath Submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia November 2010
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationChapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4
Chapters 1-5 Cumulative Assessment AP Statistics Name: November 2008 Gillespie, Block 4 Part I: Multiple Choice This portion of the test will determine 60% of your overall test grade. Each question is
More informationExtending Place Value with Whole Numbers to 1,000,000
Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationSchool Inspection in Hesse/Germany
Hessisches Kultusministerium School Inspection in Hesse/Germany Contents 1. Introduction...2 2. School inspection as a Procedure for Quality Assurance and Quality Enhancement...2 3. The Hessian framework
More informationWE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT
WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT PRACTICAL APPLICATIONS OF RANDOM SAMPLING IN ediscovery By Matthew Verga, J.D. INTRODUCTION Anyone who spends ample time working
More informationEdexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE
Edexcel GCSE Statistics 1389 Paper 1H June 2007 Mark Scheme Edexcel GCSE Statistics 1389 NOTES ON MARKING PRINCIPLES 1 Types of mark M marks: method marks A marks: accuracy marks B marks: unconditional
More informationarxiv: v1 [cs.lg] 3 May 2013
Feature Selection Based on Term Frequency and T-Test for Text Categorization Deqing Wang dqwang@nlsde.buaa.edu.cn Hui Zhang hzhang@nlsde.buaa.edu.cn Rui Liu, Weifeng Lv {liurui,lwf}@nlsde.buaa.edu.cn arxiv:1305.0638v1
More informationCEFR Overall Illustrative English Proficiency Scales
CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationAP Statistics Summer Assignment 17-18
AP Statistics Summer Assignment 17-18 Welcome to AP Statistics. This course will be unlike any other math class you have ever taken before! Before taking this course you will need to be competent in basic
More informationSchool Size and the Quality of Teaching and Learning
School Size and the Quality of Teaching and Learning An Analysis of Relationships between School Size and Assessments of Factors Related to the Quality of Teaching and Learning in Primary Schools Undertaken
More informationVOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.
Exploratory Study on Factors that Impact / Influence Success and failure of Students in the Foundation Computer Studies Course at the National University of Samoa 1 2 Elisapeta Mauai, Edna Temese 1 Computing
More informationMontana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011
Montana Content Standards for Mathematics Grade 3 Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011 Contents Standards for Mathematical Practice: Grade
More informationIdentification of Opinion Leaders Using Text Mining Technique in Virtual Community
Identification of Opinion Leaders Using Text Mining Technique in Virtual Community Chihli Hung Department of Information Management Chung Yuan Christian University Taiwan 32023, R.O.C. chihli@cycu.edu.tw
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationB. How to write a research paper
From: Nikolaus Correll. "Introduction to Autonomous Robots", ISBN 1493773070, CC-ND 3.0 B. How to write a research paper The final deliverable of a robotics class often is a write-up on a research project,
More informationSpinners at the School Carnival (Unequal Sections)
Spinners at the School Carnival (Unequal Sections) Maryann E. Huey Drake University maryann.huey@drake.edu Published: February 2012 Overview of the Lesson Students are asked to predict the outcomes of
More informationThe Evolution of Random Phenomena
The Evolution of Random Phenomena A Look at Markov Chains Glen Wang glenw@uchicago.edu Splash! Chicago: Winter Cascade 2012 Lecture 1: What is Randomness? What is randomness? Can you think of some examples
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationCalculators in a Middle School Mathematics Classroom: Helpful or Harmful?
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Action Research Projects Math in the Middle Institute Partnership 7-2008 Calculators in a Middle School Mathematics Classroom:
More informationMaximizing Learning Through Course Alignment and Experience with Different Types of Knowledge
Innov High Educ (2009) 34:93 103 DOI 10.1007/s10755-009-9095-2 Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge Phyllis Blumberg Published online: 3 February
More informationMining Association Rules in Student s Assessment Data
www.ijcsi.org 211 Mining Association Rules in Student s Assessment Data Dr. Varun Kumar 1, Anupama Chadha 2 1 Department of Computer Science and Engineering, MVN University Palwal, Haryana, India 2 Anupama
More informationIntroduction to Causal Inference. Problem Set 1. Required Problems
Introduction to Causal Inference Problem Set 1 Professor: Teppei Yamamoto Due Friday, July 15 (at beginning of class) Only the required problems are due on the above date. The optional problems will not
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationIssues in the Mining of Heart Failure Datasets
International Journal of Automation and Computing 11(2), April 2014, 162-179 DOI: 10.1007/s11633-014-0778-5 Issues in the Mining of Heart Failure Datasets Nongnuch Poolsawad 1 Lisa Moore 1 Chandrasekhar
More informationUniversidade do Minho Escola de Engenharia
Universidade do Minho Escola de Engenharia Universidade do Minho Escola de Engenharia Dissertação de Mestrado Knowledge Discovery is the nontrivial extraction of implicit, previously unknown, and potentially
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationMathematics Scoring Guide for Sample Test 2005
Mathematics Scoring Guide for Sample Test 2005 Grade 4 Contents Strand and Performance Indicator Map with Answer Key...................... 2 Holistic Rubrics.......................................................
More informationA GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING
A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland
More informationMath 96: Intermediate Algebra in Context
: Intermediate Algebra in Context Syllabus Spring Quarter 2016 Daily, 9:20 10:30am Instructor: Lauri Lindberg Office Hours@ tutoring: Tutoring Center (CAS-504) 8 9am & 1 2pm daily STEM (Math) Center (RAI-338)
More informationHigher education is becoming a major driver of economic competitiveness
Executive Summary Higher education is becoming a major driver of economic competitiveness in an increasingly knowledge-driven global economy. The imperative for countries to improve employment skills calls
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationUsing Web Searches on Important Words to Create Background Sets for LSI Classification
Using Web Searches on Important Words to Create Background Sets for LSI Classification Sarah Zelikovitz and Marina Kogan College of Staten Island of CUNY 2800 Victory Blvd Staten Island, NY 11314 Abstract
More informationArizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS
Arizona s English Language Arts Standards 11-12th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS 11 th -12 th Grade Overview Arizona s English Language Arts Standards work together
More informationEducator s e-portfolio in the Modern University
Educator s e-portfolio in the Modern University Nataliia Morze 1, Liliia Varchenko-Trotsenko 1 1 Borys Grinchenko Kyiv University, 18/2 Bulvarno-Kudriavska Str, Kyiv, Ukraine, n.morze@kubg.edu.ua, l.varchenko@kubg.edu.ua
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationTHE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY
THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY F. Felip Miralles, S. Martín Martín, Mª L. García Martínez, J.L. Navarro
More informationA Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique
A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique Hiromi Ishizaki 1, Susan C. Herring 2, Yasuhiro Takishima 1 1 KDDI R&D Laboratories, Inc. 2 Indiana University
More informationLesson M4. page 1 of 2
Lesson M4 page 1 of 2 Miniature Gulf Coast Project Math TEKS Objectives 111.22 6b.1 (A) apply mathematics to problems arising in everyday life, society, and the workplace; 6b.1 (C) select tools, including
More informationlearning collegiate assessment]
[ collegiate learning assessment] INSTITUTIONAL REPORT 2005 2006 Kalamazoo College council for aid to education 215 lexington avenue floor 21 new york new york 10016-6023 p 212.217.0700 f 212.661.9766
More informationBusiness Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages
More informationTABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards
TABE 9&10 Revised 8/2013- with reference to College and Career Readiness Standards LEVEL E Test 1: Reading Name Class E01- INTERPRET GRAPHIC INFORMATION Signs Maps Graphs Consumer Materials Forms Dictionary
More informationSURVIVING ON MARS WITH GEOGEBRA
SURVIVING ON MARS WITH GEOGEBRA Lindsey States and Jenna Odom Miami University, OH Abstract: In this paper, the authors describe an interdisciplinary lesson focused on determining how long an astronaut
More informationUsing Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons
Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons Albert Weichselbraun University of Applied Sciences HTW Chur Ringstraße 34 7000 Chur, Switzerland albert.weichselbraun@htwchur.ch
More informationUsing focal point learning to improve human machine tacit coordination
DOI 10.1007/s10458-010-9126-5 Using focal point learning to improve human machine tacit coordination InonZuckerman SaritKraus Jeffrey S. Rosenschein The Author(s) 2010 Abstract We consider an automated
More information