Ensemble of Heterogeneous Classifier Model

5 Ensemble of Heterogeneous Classifier Model 5.1 Overview Heterogeneous ensemble of classifier refers to combine the predictions of multiple base models. Here the term base model refers to any other classifier model such as single classifier model, ensemble of homogeneous classifier model. The term heterogeneity refers to inclusion of different decision techniques such as classification or regression to make a decision process. Unlike homogenous ensemble of classifier model, heterogeneous ensemble of classifier model not only considers like classifiers as base learners but also considers classifiers from different sources. Homogeneous ensemble methods use the same base learner on different distributions of the training set, e.g. bagging and boosting. Heterogeneous ensemble methods incorporate different model types into the library of models, the idea being that different base model types can be both accurate and diverse [58]. By considering heterogeneous sources one can provide much diversity for the ensemble system and such diversity is very much required for ensemble system to perform better. In this chapter we are using two heterogeneous ensemble of classifier models, which are stacking and voting schemes. The main objective of this study is to examine

the performance of ensemble of classifier by adding more diversity by combining the model for classifier as well as regression model. Based on the evidence from the previous chapter we are modifying the scheme of evaluation in this chapter. Instead of using all the classifier from different families we are selecting few of them to include in the ensemble. This is because much of classifiers are showing similar behavior and making similar error in classifying the instances. Adding such similar natured classifier does not induce any supplementary diversity to the ensemble. Thus it increases only the complexity of the ensemble but does not yield significant improvement. Therefore we have removed some of the classifiers. Thus for experiments we are using Naive Bayes, PART, SVM and J48 classifiers as a base learners only. 5.2 Stacked generalization In machine learning, ensemble methods use multiple models to obtain better predictive performance than could be obtained from any of the constituent models [59]. Stacked Generalization (or stacking) was first proposed by Wolpert in 1992 [26], is a way of combining multiple models that introduces the concept of a meta-learner. Although it is an attractive idea but it is less used than bagging and boosting in literature. Stacking is a machine learning technique and it is a variant in ensemble literature in that it actively seeks to improve the performance of the ensemble by correcting the errors. It addresses the issue of classifier bias with respect to training data and aims at learning and using these biases to improve classification and it is regarded as stacked generalization. It is concerned with combining multiple classifiers generated using different base classifiers C1, C2,..., Cn on a single dataset, which consist of pattern examples. In the first phase, a set of base level classifiers are generated. In second phase, a metalevel classifier is learned that combines the outputs of the base level classifier. In brief, stacking can be visualized as a method which uses a new classifier to correct the errors of previously learned classifier. The algorithm for Stacked Generalization 100

ensemble of heterogeneous classifier is given in Figure 5.1. Figure 5.1: Stacked Generalization Algorithm 5.2.1 Results from Stacked generalization ensemble method In this section, we present the results from our experiments using Stacking ensemble of heterogeneous classifier method. We have used the sinlge classifier models and ensemble of homogenous classifier models as a base learner in first phase of Stacking ensemble of classifier. IBK, Naive Bayes, MLP, SVM, J48, REPTree and Random tree are considered from single classifier model. Apart from above mentioned classifier we are also considering, Bagged IBK, Bagged Naive Bayes, Bagged MLP, Bagged SVM, Bagged J48, Bagged REPTree, Bagged Random tree, Boosted IBK, Boosted Naive Bayes, Boosted MLP, Boosted SVM, Boosted J48, Boosted REPTree, Boosted Random tree, Decorate IBK, Decorate Naive Bayes, Decorate MLP, Decorate SVM, Decorate J48, Decorate REPTree and Decorate Random tree classifiers from ensemble of homogeneous model. In second phase of Stacking method we have used linear regression with M5 attribute selection criteria with 10 fold cross validation.the results from Stacking ensemble of heterogeneous classifier method is given in Table 5.1. 101

Table 5.1: Results from Stacked Generalization ensemble of classifier method Annotation Accuracy F-measure RMSE AUC Kappa Calcification 97.78 0.90 0.09 1.00 0.87 Internal Structure 99.58 1.00 0.04 0.96 0.77 Lobulation 87.47 0.90 0.20 0.97 0.83 Malignancy 83.85 0.89 0.22 0.99 0.79 Margin 87.25 0.90 0.20 0.99 0.83 Sphericity 79.69 0.90 0.24 0.96 0.73 Spiculation 91.68 0.95 0.17 0.98 0.88 Subtlety 82.45 0.69 0.23 0.95 0.72 Texture 91.53 0.90 0.16 0.99 0.80 5.2.2 Some Key Observations: 1. As for as Stacked Generalization method is concerned, on an average the method is performing better and yielding reliable results with respect to all the performance metric expect for the case of characteristic rating Subtlety. 2. Stacking method is giving good Accuracy, RMSE and Kappa value for Subtlety rating but for the metric Fmeasure it is giving 0.69. This result shows unpredictable behavior of classifier model on LIDC data. Though it may be yielding better results but it cant be reliable. 5.3 Voting Voting is popular ensemble method [60]. Voting combines the decision from multiple models based on combinational rule which happens to be a different combination of probability estimates. Models can be of different types i.e. decisions from either single classifier model, ensemble of homogenous classifier model or even decisions from any other heterogeneous ensemble model. The scheme used in voting method is very much straight forward and much similar to the majority voting combination technique which is used in any other ensembles such as bagging or AdaBoost. The main difference is that, in Bagging or AdaBoost voting scheme acts as combination rule to make final decision whereas in voting ensemble method, voting refers to a class or learner which gets the labels as in- 102

puts from various sources and uses probability estimates to make final decision. The popular probability estimates which is associated with voting are, average of probability, majority voting, product of probability, maximum of probability, minimum of probability and median [56]. In this work we are using voting method with majority vote probability estimate for experiments. 5.3.1 Results from Voting ensemble of classifier method For Voting method we are using same set of classifier which we have considered for Stacking method. Majority voting combination rule is employed to make the final decision in Voting scheme. The results from experiments using Voting ensemble of heterogeneous classifier is given in Table 5.2. Table 5.2: Results from Voting ensemble of classifier method Annotation Accuracy F-Measure RMSE AUC Kappa Calcification 97.04 0.88 0.12 0.93 0.82 Internal Structure 99.25 1.00 0.06 0.66 0.46 Lobulation 82.54 0.87 0.26 0.90 0.76 Malignancy 78.52 0.87 0.29 0.89 0.72 Margin 81.84 0.85 0.27 0.90 0.75 Sphericity 73.06 0.87 0.33 0.90 0.64 Spiculation 87.85 0.92 0.22 0.92 0.82 Subtlety 76.73 0.54 0.30 0.75 0.62 Texture 87.44 0.84 0.22 0.92 0.69 5.3.2 Some Key Observation: 1. As for as Voting method for combining heterogeneous model concerned, the overall performance with respect to different performance is found to be satisfactory. But as in Stacking method voting Scheme also showing unpredictable behavior with respect to characteristic rating under F-measure metric. 103

5.4 Class-wise comparison between Stacked Generalization and Voting ensemble of Classifier methods In this section we present the comparative analysis between Stacking and Voting scheme with respet to five different performance metric such as Accuracy, F- measure, RMSE, AUC and Kappa statistics with reference to each charactertic rating. The comparative analysis for Calcification class is given in Table 5.3, Internal Structure class is given in Table 5.4, Lobulation class is given in Table 5.5, Malignancy class is given in Table 5.6, Margin class is given in Table 5.7, Sphericity class is given in Table 5.8, Spiculation class is given in Table 5.9, Subtlety class is given in Table 5.10 and Texture class is given in Table 5.11. Table 5.3: Comparison of Stacking and Voting methods over Calcification rating Accuracy 97.78 97.04 F-measure 0.90 0.88 RMSE 0.09 0.12 AUC 1.00 0.93 Kappa 0.87 0.82 Table 5.4: Comparison of Stacking and Voting methods over Internal Structure rating Accuracy 99.58 99.25 F-measure 1.00 1.00 RMSE 0.04 0.06 AUC 0.96 0.66 Kappa 0.77 0.73 104

Table 5.5: Comparison of Stacking and Voting methods over Lobulation rating Accuracy 87.47 82.54 F-measure 0.90 0.87 RMSE 0.20 0.26 AUC 0.97 0.90 Kappa 0.83 0.76 Table 5.6: Comparison of Stacking and Voting methods over Malignancy rating Accuracy 83.85 78.52 F-measure 0.89 0.87 RMSE 0.22 0.29 AUC 0.97 0.89 Kappa 0.79 0.75 Table 5.7: Comparison of Stacking and Voting methods over Margin rating Accuracy 87.25 81.84 F-measure 0.90 0.85 RMSE 0.20 0.27 AUC 0.99 0.90 Kappa 0.83 0.75 Table 5.8: Comparison of Stacking and Voting methods over Sphericity rating Accuracy 79.69 73.06 F-measure 0.93 0.87 RMSE 0.24 0.33 AUC 0.96 0.9 Kappa 0.73 0.64 105

Table 5.9: Comparison of Stacking and Voting methods over Spiculation rating Accuracy 91.68 87.85 F-measure 0.95 0.92 RMSE 0.17 0.22 AUC 0.98 0.92 Kappa 0.88 0.82 Table 5.10: Comparison of Stacking and Voting methods over Subtlety rating Accuracy 82.45 76.73 F-measure 0.69 0.54 RMSE 0.23 0.30 AUC 0.95 0.75 Kappa 0.72 0.62 Table 5.11: Comparison of Stacking and Voting methods over Texture rating Accuracy 91.53 87.49 F-measure 0.90 0.84 RMSE 0.16 0.22 AUC 0.99 0.92 Kappa 0.80 0.69 106

5.5 Summary In heterogeneous ensemble of classifier methods, the result from our experiments on LIDC shows that stacking method outperforms voting method. As for the prediction performance for characteristic ratings concerned, almost in all the cases stacked generalization yields better performance compared to voting scheme. But it is necessary to point out the unpredictable behavior of the classifiers as in single classifier model and ensemble of homogenous classifier model continues with respect to ensemble of homogenous ensemble of classifier model. This unpredictable nature of classifier model is not distributed evenly across the classifier and across the characteristic ratings. In other words, when we consider single classifier model all the classifiers are yielding better results with respect to rating subtlety, where as in the ensemble of homogenous classifier model, the same classifier as base learner on the same rating subtlety is giving uneven results. For this issue we tried to investigate the solution by using heterogeneous ensemble of classifier by providing addition diversity to the ensemble. But the unpredictable behavior of classifier model continued with respect to heterogeneous ensemble of classifier model. Which in turns signifies that, providing a solution through tweaking at algorithmic level only gives us supplementary improvement in the results but fails at providing accurate decision in choosing the correct methodology for classifying the instances in LIDC data. This motivated us to investigate the underlying distribution of data samples with respect to each characteristic rating class and investigate the learning strategy at data level. In following chapter we use sampling techniques along with algorithm level learning to address this issue. 107