Ensemble of Heterogeneous Classifier Model

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Learning From the Past with Experiment Databases

CS Machine Learning

Python Machine Learning

CS 446: Machine Learning

Reducing Features to Improve Bug Prediction

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Activity Recognition from Accelerometer Data

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Ensemble Technique Utilization for Indonesian Dependency Parser

Assignment 1: Predicting Amazon Review Ratings

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Model Ensemble for Click Prediction in Bing Search Ads

Human Emotion Recognition From Speech

An Empirical Comparison of Supervised Ensemble Learning Approaches

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Multi-Lingual Text Leveling

Softprop: Softmax Neural Network Backpropagation Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Rule Learning with Negation: Issues Regarding Effectiveness

Universidade do Minho Escola de Engenharia

arxiv: v2 [cs.cv] 30 Mar 2017

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Applications of data mining algorithms to analysis of medical data

Indian Institute of Technology, Kanpur

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Rule Learning With Negation: Issues Regarding Effectiveness

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Linking Task: Identifying authors and book titles in verbose queries

Multivariate k-nearest Neighbor Regression for Time Series data -

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Word Segmentation of Off-line Handwritten Documents

Switchboard Language Model Improvement with Conversational Data from Gigaword

Multi-label classification via multi-target regression on data streams

Content-based Image Retrieval Using Image Regions as Query Examples

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

A Case Study: News Classification Based on Term Frequency

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Speech Emotion Recognition Using Support Vector Machine

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

arxiv: v1 [cs.lg] 15 Jun 2015

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A new way to share, organize and learn from experiments

STUDYING ACADEMIC INDICATORS WITHIN VIRTUAL LEARNING ENVIRONMENT USING EDUCATIONAL DATA MINING

On-the-Fly Customization of Automated Essay Scoring

The Boosting Approach to Machine Learning An Overview

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

The stages of event extraction

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CSL465/603 - Machine Learning

Australian Journal of Basic and Applied Sciences

arxiv: v1 [cs.cv] 10 May 2017

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Beyond the Pipeline: Discrete Optimization in NLP

Semi-Supervised Face Detection

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Learning to Rank with Selection Bias in Personal Search

Genre classification on German novels

A Bayesian Learning Approach to Concept-Based Document Classification

UCLA UCLA Electronic Theses and Dissertations

A survey of multi-view machine learning

Universityy. The content of

Learning Distributed Linguistic Classes

Generative models and adversarial training

Discriminative Learning of Beam-Search Heuristics for Planning

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Verbal Behaviors and Persuasiveness in Online Multimedia Content

Time series prediction

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Edinburgh Research Explorer

A Vector Space Approach for Aspect-Based Sentiment Analysis

Truth Inference in Crowdsourcing: Is the Problem Solved?

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Handling Concept Drifts Using Dynamic Selection of Classifiers

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Speech Recognition at ICSI: Broadcast News and beyond

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Conference Presentation

Chapter 2 Rule Learning in a Nutshell

Multi-label Classification via Multi-target Regression on Data Streams

Learning Computational Grammars

Comment-based Multi-View Clustering of Web 2.0 Items

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Using dialogue context to improve parsing performance in dialogue systems

How to Judge the Quality of an Objective Classroom Test

arxiv: v1 [cs.cl] 2 Apr 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 1: Basic Concepts of Machine Learning

Transcription:

5 Ensemble of Heterogeneous Classifier Model 5.1 Overview Heterogeneous ensemble of classifier refers to combine the predictions of multiple base models. Here the term base model refers to any other classifier model such as single classifier model, ensemble of homogeneous classifier model. The term heterogeneity refers to inclusion of different decision techniques such as classification or regression to make a decision process. Unlike homogenous ensemble of classifier model, heterogeneous ensemble of classifier model not only considers like classifiers as base learners but also considers classifiers from different sources. Homogeneous ensemble methods use the same base learner on different distributions of the training set, e.g. bagging and boosting. Heterogeneous ensemble methods incorporate different model types into the library of models, the idea being that different base model types can be both accurate and diverse [58]. By considering heterogeneous sources one can provide much diversity for the ensemble system and such diversity is very much required for ensemble system to perform better. In this chapter we are using two heterogeneous ensemble of classifier models, which are stacking and voting schemes. The main objective of this study is to examine

the performance of ensemble of classifier by adding more diversity by combining the model for classifier as well as regression model. Based on the evidence from the previous chapter we are modifying the scheme of evaluation in this chapter. Instead of using all the classifier from different families we are selecting few of them to include in the ensemble. This is because much of classifiers are showing similar behavior and making similar error in classifying the instances. Adding such similar natured classifier does not induce any supplementary diversity to the ensemble. Thus it increases only the complexity of the ensemble but does not yield significant improvement. Therefore we have removed some of the classifiers. Thus for experiments we are using Naive Bayes, PART, SVM and J48 classifiers as a base learners only. 5.2 Stacked generalization In machine learning, ensemble methods use multiple models to obtain better predictive performance than could be obtained from any of the constituent models [59]. Stacked Generalization (or stacking) was first proposed by Wolpert in 1992 [26], is a way of combining multiple models that introduces the concept of a meta-learner. Although it is an attractive idea but it is less used than bagging and boosting in literature. Stacking is a machine learning technique and it is a variant in ensemble literature in that it actively seeks to improve the performance of the ensemble by correcting the errors. It addresses the issue of classifier bias with respect to training data and aims at learning and using these biases to improve classification and it is regarded as stacked generalization. It is concerned with combining multiple classifiers generated using different base classifiers C1, C2,..., Cn on a single dataset, which consist of pattern examples. In the first phase, a set of base level classifiers are generated. In second phase, a metalevel classifier is learned that combines the outputs of the base level classifier. In brief, stacking can be visualized as a method which uses a new classifier to correct the errors of previously learned classifier. The algorithm for Stacked Generalization 100

ensemble of heterogeneous classifier is given in Figure 5.1. Figure 5.1: Stacked Generalization Algorithm 5.2.1 Results from Stacked generalization ensemble method In this section, we present the results from our experiments using Stacking ensemble of heterogeneous classifier method. We have used the sinlge classifier models and ensemble of homogenous classifier models as a base learner in first phase of Stacking ensemble of classifier. IBK, Naive Bayes, MLP, SVM, J48, REPTree and Random tree are considered from single classifier model. Apart from above mentioned classifier we are also considering, Bagged IBK, Bagged Naive Bayes, Bagged MLP, Bagged SVM, Bagged J48, Bagged REPTree, Bagged Random tree, Boosted IBK, Boosted Naive Bayes, Boosted MLP, Boosted SVM, Boosted J48, Boosted REPTree, Boosted Random tree, Decorate IBK, Decorate Naive Bayes, Decorate MLP, Decorate SVM, Decorate J48, Decorate REPTree and Decorate Random tree classifiers from ensemble of homogeneous model. In second phase of Stacking method we have used linear regression with M5 attribute selection criteria with 10 fold cross validation.the results from Stacking ensemble of heterogeneous classifier method is given in Table 5.1. 101

Table 5.1: Results from Stacked Generalization ensemble of classifier method Annotation Accuracy F-measure RMSE AUC Kappa Calcification 97.78 0.90 0.09 1.00 0.87 Internal Structure 99.58 1.00 0.04 0.96 0.77 Lobulation 87.47 0.90 0.20 0.97 0.83 Malignancy 83.85 0.89 0.22 0.99 0.79 Margin 87.25 0.90 0.20 0.99 0.83 Sphericity 79.69 0.90 0.24 0.96 0.73 Spiculation 91.68 0.95 0.17 0.98 0.88 Subtlety 82.45 0.69 0.23 0.95 0.72 Texture 91.53 0.90 0.16 0.99 0.80 5.2.2 Some Key Observations: 1. As for as Stacked Generalization method is concerned, on an average the method is performing better and yielding reliable results with respect to all the performance metric expect for the case of characteristic rating Subtlety. 2. Stacking method is giving good Accuracy, RMSE and Kappa value for Subtlety rating but for the metric Fmeasure it is giving 0.69. This result shows unpredictable behavior of classifier model on LIDC data. Though it may be yielding better results but it cant be reliable. 5.3 Voting Voting is popular ensemble method [60]. Voting combines the decision from multiple models based on combinational rule which happens to be a different combination of probability estimates. Models can be of different types i.e. decisions from either single classifier model, ensemble of homogenous classifier model or even decisions from any other heterogeneous ensemble model. The scheme used in voting method is very much straight forward and much similar to the majority voting combination technique which is used in any other ensembles such as bagging or AdaBoost. The main difference is that, in Bagging or AdaBoost voting scheme acts as combination rule to make final decision whereas in voting ensemble method, voting refers to a class or learner which gets the labels as in- 102

puts from various sources and uses probability estimates to make final decision. The popular probability estimates which is associated with voting are, average of probability, majority voting, product of probability, maximum of probability, minimum of probability and median [56]. In this work we are using voting method with majority vote probability estimate for experiments. 5.3.1 Results from Voting ensemble of classifier method For Voting method we are using same set of classifier which we have considered for Stacking method. Majority voting combination rule is employed to make the final decision in Voting scheme. The results from experiments using Voting ensemble of heterogeneous classifier is given in Table 5.2. Table 5.2: Results from Voting ensemble of classifier method Annotation Accuracy F-Measure RMSE AUC Kappa Calcification 97.04 0.88 0.12 0.93 0.82 Internal Structure 99.25 1.00 0.06 0.66 0.46 Lobulation 82.54 0.87 0.26 0.90 0.76 Malignancy 78.52 0.87 0.29 0.89 0.72 Margin 81.84 0.85 0.27 0.90 0.75 Sphericity 73.06 0.87 0.33 0.90 0.64 Spiculation 87.85 0.92 0.22 0.92 0.82 Subtlety 76.73 0.54 0.30 0.75 0.62 Texture 87.44 0.84 0.22 0.92 0.69 5.3.2 Some Key Observation: 1. As for as Voting method for combining heterogeneous model concerned, the overall performance with respect to different performance is found to be satisfactory. But as in Stacking method voting Scheme also showing unpredictable behavior with respect to characteristic rating under F-measure metric. 103

5.4 Class-wise comparison between Stacked Generalization and Voting ensemble of Classifier methods In this section we present the comparative analysis between Stacking and Voting scheme with respet to five different performance metric such as Accuracy, F- measure, RMSE, AUC and Kappa statistics with reference to each charactertic rating. The comparative analysis for Calcification class is given in Table 5.3, Internal Structure class is given in Table 5.4, Lobulation class is given in Table 5.5, Malignancy class is given in Table 5.6, Margin class is given in Table 5.7, Sphericity class is given in Table 5.8, Spiculation class is given in Table 5.9, Subtlety class is given in Table 5.10 and Texture class is given in Table 5.11. Table 5.3: Comparison of Stacking and Voting methods over Calcification rating Accuracy 97.78 97.04 F-measure 0.90 0.88 RMSE 0.09 0.12 AUC 1.00 0.93 Kappa 0.87 0.82 Table 5.4: Comparison of Stacking and Voting methods over Internal Structure rating Accuracy 99.58 99.25 F-measure 1.00 1.00 RMSE 0.04 0.06 AUC 0.96 0.66 Kappa 0.77 0.73 104

Table 5.5: Comparison of Stacking and Voting methods over Lobulation rating Accuracy 87.47 82.54 F-measure 0.90 0.87 RMSE 0.20 0.26 AUC 0.97 0.90 Kappa 0.83 0.76 Table 5.6: Comparison of Stacking and Voting methods over Malignancy rating Accuracy 83.85 78.52 F-measure 0.89 0.87 RMSE 0.22 0.29 AUC 0.97 0.89 Kappa 0.79 0.75 Table 5.7: Comparison of Stacking and Voting methods over Margin rating Accuracy 87.25 81.84 F-measure 0.90 0.85 RMSE 0.20 0.27 AUC 0.99 0.90 Kappa 0.83 0.75 Table 5.8: Comparison of Stacking and Voting methods over Sphericity rating Accuracy 79.69 73.06 F-measure 0.93 0.87 RMSE 0.24 0.33 AUC 0.96 0.9 Kappa 0.73 0.64 105

Table 5.9: Comparison of Stacking and Voting methods over Spiculation rating Accuracy 91.68 87.85 F-measure 0.95 0.92 RMSE 0.17 0.22 AUC 0.98 0.92 Kappa 0.88 0.82 Table 5.10: Comparison of Stacking and Voting methods over Subtlety rating Accuracy 82.45 76.73 F-measure 0.69 0.54 RMSE 0.23 0.30 AUC 0.95 0.75 Kappa 0.72 0.62 Table 5.11: Comparison of Stacking and Voting methods over Texture rating Accuracy 91.53 87.49 F-measure 0.90 0.84 RMSE 0.16 0.22 AUC 0.99 0.92 Kappa 0.80 0.69 106

5.5 Summary In heterogeneous ensemble of classifier methods, the result from our experiments on LIDC shows that stacking method outperforms voting method. As for the prediction performance for characteristic ratings concerned, almost in all the cases stacked generalization yields better performance compared to voting scheme. But it is necessary to point out the unpredictable behavior of the classifiers as in single classifier model and ensemble of homogenous classifier model continues with respect to ensemble of homogenous ensemble of classifier model. This unpredictable nature of classifier model is not distributed evenly across the classifier and across the characteristic ratings. In other words, when we consider single classifier model all the classifiers are yielding better results with respect to rating subtlety, where as in the ensemble of homogenous classifier model, the same classifier as base learner on the same rating subtlety is giving uneven results. For this issue we tried to investigate the solution by using heterogeneous ensemble of classifier by providing addition diversity to the ensemble. But the unpredictable behavior of classifier model continued with respect to heterogeneous ensemble of classifier model. Which in turns signifies that, providing a solution through tweaking at algorithmic level only gives us supplementary improvement in the results but fails at providing accurate decision in choosing the correct methodology for classifying the instances in LIDC data. This motivated us to investigate the underlying distribution of data samples with respect to each characteristic rating class and investigate the learning strategy at data level. In following chapter we use sampling techniques along with algorithm level learning to address this issue. 107