An Empirical Study on Combining Instance-Based and Rule-Based Classifiers

Similar documents
Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Rule Learning with Negation: Issues Regarding Effectiveness

AQUA: An Ontology-Driven Question Answering System

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Case Study: News Classification Based on Term Frequency

Cooperative evolutive concept learning: an empirical study

Learning Methods for Fuzzy Systems

Speech Emotion Recognition Using Support Vector Machine

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Reducing Features to Improve Bug Prediction

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

CS Machine Learning

Word Segmentation of Off-line Handwritten Documents

Probabilistic Latent Semantic Analysis

Does the Difficulty of an Interruption Affect our Ability to Resume?

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Henry Tirri* Petri Myllymgki

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Evidence for Reliability, Validity and Learning Effectiveness

SARDNET: A Self-Organizing Feature Map for Sequences

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Preprint.

CSL465/603 - Machine Learning

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Mining Association Rules in Student s Assessment Data

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Python Machine Learning

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Automating the E-learning Personalization

Human Emotion Recognition From Speech

Stephanie Ann Siler. PERSONAL INFORMATION Senior Research Scientist; Department of Psychology, Carnegie Mellon University

Knowledge Transfer in Deep Convolutional Neural Nets

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Action Models and their Induction

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Using Genetic Algorithms and Decision Trees for a posteriori Analysis and Evaluation of Tutoring Practices based on Student Failure Models

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

An Asset-Based Approach to Linguistic Diversity

MYCIN. The MYCIN Task

Problems of the Arabic OCR: New Attitudes

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

On-Line Data Analytics

Applications of data mining algorithms to analysis of medical data

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

Speech Recognition at ICSI: Broadcast News and beyond

Computerized Adaptive Psychological Testing A Personalisation Perspective

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Agent-Based Software Engineering

Learning and Transferring Relational Instance-Based Policies

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Managing Experience for Process Improvement in Manufacturing

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

GACE Computer Science Assessment Test at a Glance

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Learning Distributed Linguistic Classes

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Chapter 2 Rule Learning in a Nutshell

Statewide Framework Document for:

Graphical Data Displays and Database Queries: Helping Users Select the Right Display for the Task

BENCHMARK TREND COMPARISON REPORT:

The recognition, evaluation and accreditation of European Postgraduate Programmes.

Modeling user preferences and norms in context-aware systems

Multivariate k-nearest Neighbor Regression for Time Series data -

Learning Cases to Resolve Conflicts and Improve Group Behavior

Lecture 1: Machine Learning Basics

Using Web Searches on Important Words to Create Background Sets for LSI Classification

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Guide to Teaching Computer Science

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Diploma in Library and Information Science (Part-Time) - SH220

Seminar - Organic Computing

Education: Integrating Parallel and Distributed Computing in Computer Science Curricula

Assignment 1: Predicting Amazon Review Ratings

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

MMOG Subscription Business Models: Table of Contents

Lecture 1: Basic Concepts of Machine Learning

Australian Journal of Basic and Applied Sciences

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Software Maintenance

The stages of event extraction

This project has been funded with support from the European Commission. This publication [communication] reflects only the views of the author, and

Transcription:

From: AAAI Technical Report SS-98-04. Compilation copyright 1998, AAAI (www.aaai.org). All rights reserved. An Empirical Study on Combining Instance-Based and Rule-Based Classifiers Jerzy Surma Koen Vanhoof Department of Computer Science Technical University of Wroclaw Wybrzeze Wyspianskiego 27 50-370 Wroclaw, Poland Email: surma~ci.pwr.wroc.pi. Faculty of Applied Economic Science Limburgs University Center B-3590 Diepenbeek, Gebouw D Belgium Emali: vanhoof@rsftew.luc.ac.be Abstract One of the most important challenges in developing problem solving methods is to combine and synergistically utilize general and specific knowledge. This paper presents one possible way of performing this integration that might be generally described as follows: "To solve a problem, first try to use the conventional rulebased approach. If it does not work, try to find a similar problem you have solved in the past and adapt the old solution to the new situation". We applied this heuristic for a classification task. The background concepts of this heuristic are standard cases (the source of data for the rules) and exceptional cases (representative of the specific knowledge). The presented empirical study has not shown this attempt to be successful in accuracy when it is compare to its parents methods: instance-based and rulebased approach, but a careful policy in the distribution of standard and exceptional cases might provide a competitive classifier in terms of accuracy and comprehensibility. 1 Introduction In recent years case-based reasoning has gained popularity as an alternative to the rule-based approach, but the human problem solving psychological investigations (Riesbeck and Schank 1989) show that there is a wide spectrum from specific cases to very general rules typically used. This view is supported by an integrated architecture of the two problem solving paradigms: instance-based learning and a rule-based system (Surma and Vanhoof 1995). In this approach the classifying process is based on rules (that represent standard and/or a typical situation) and cases (that represent the particular experience, exceptions and/or non-typical situations). For this kind of knowledge the classifier uses the following heuristics: To classify a new case, the ordered list of rules is examined to find the first whose condition is satisfied by the case. If no rule condition is satisfied, then the case is classified by means of the nearest-neighbor algorithm on the exceptional cases. The knowledge structures required by this heuristic are obtained from an input set of cases. This set is split into two disjoined subsets of exceptional and standard cases. The exceptional cases are the source of data for the instance-based component, and the standard cases are used for generating rules by means of the induction algorithm (in order to avoid overgeneralization, the exceptional cases are in use during the induction process too). The splitting criterion problem is a core of the next section. As it was clearly summarized by Domingos (Domingos 1996), those two problem solving paradigms appear to have complementary advantages and disadvantages. For example the instance-based classifiers are able to deal with complex frontiers from relatively few examples, but are very sensitive to irrelevant attributes. Conversely, the induction rules algorithm can relatively easily dispose irrelevant attributes, but suffer from the small disjuncts problem (i.e. rules covering few training examples have a high error rate (Holte, Acker, and Porter, 1989)). course this combining problem solving heuristic should be applied very carefully, but its psychological roots 130

(assuming that the input problem is a standard one and starting the solving strategy with a general knowledge) gives a background for a good comprehensibility. The experimental evaluation in human resources management (Surma and Vanhoof 1995) and bankruptcy prediction (Surma, Vanhoof, and Limere 1997) shows a good explanatory ability of the integrated approach. The rule sets generated from the standard cases are more readable for an expert than rule sets generated from all available cases. This increase in comprehensibility was obtained without a significant decrease of the classifying accuracy. In this paper we would like to evaluate this statement more precisely on the standard machine learning databases. The goal of the paper is twofold. First, we compare the classification accuracy of the integrated architecture with the C4.5 and I-NN (nearest-neighbor) classifiers. Second, we empirically evaluate the different splitting criteria. In section 2 the splitting criteria are introduced. In section 3 we show the empirical results of comparisons between the different splitting criteria and classification approaches. In section 4 we present a short overview of related work on integrating case-based and rule-based reasoning. Finally the paper concludes with the final remarks. 2 Splitting Criterion One of the most important problems with the integrated approach is to find a suitable database splitting criterion. We took into consideration the heuristic approach that is based on Zhang s formalization of the family resemblance idea (Zhang 1992). It is assumed that the typical cases have higher intra-class similarity and lower inter-class similarity than atypical cases. The intra-class similarity of a case (intra_sim) is defined as a case average similarity to other cases in the same class, and the inter-class similarity (intersim) is defined as its average similarity to all the cases in other classes. This issue is formally described in our previous paper (Surma and Vanhoof 1995). Let us introduce two typicality functions (for each case c from the input set of cases): typicality-i(c) = intra_sim(c) and typicality-ii(c) intra_sim(c) - inter_sim(c). Those functions reflect ways of interpretation of the exceptional cases. The cases with a small value of typicality-i are interpreted as the,,outliers", commonly placed outside the main cluster of its class. The typicality-ii function (based on the Zhang resemblance idea) is computed in the context of other classes, and consequently exceptions are placed in the borders between classes. Now we can apply those typicality functions for splitting an input set of cases. The optimum splitting point (with accuracy as an objective function) might vary from one database to another. It is possible to establish that splitting point in the experimental way. By means of a given typicality function we can order cases from the most typical to the less typical one. Based on this order we can evaluate experimentally the integrated approach by testing different splitting points for every typicality function. In the next section we present precisely that experiment. 3 Empirical Results Database For the experiment 5 public access sets from the UCI Repository of Machine Learning Databases at the University of California (Irvine) were selected. The variety of the databases characteristics is shown in Table 1. Table 1. Databases characteristics Characteristic: Voting Zoo Crx Monk Led Size 435 101 690 556 1000 No of attributes 16 17 15 6 7 No of classes 2 7 2 2 10 Symbolic values + + + + + Numeric values + + Unknown values + Noise + where: Voting: U.S. Congressional voting 1984, Zoo: the Zoo database, Crx: Japanese credit screening, Monk: the Monk problem, Led: the digital LED display domain. The experiments were conducted by means of 10 fold cross-validation. In all experiments the rules were generated with the help of Quinlan s C4.5 Machine Learning programs (Quinlan 1993). Figures 1,2,3,4, and 5 show the results of the comparison between the splitting functions: typicality-i and typicality-ii. The dimension X in every graph is normalized in order to representing the output after 10 tests for 0% of standard cases (100% exceptions), 25% (75% exceptions)... until 100% (0% exceptions). The results for 0% standard cases are obtained due to the 1-NN classifier, because all the learning cases are interpreted as exceptions. And the results for 100% standard cases are obtained due to the C4.5 classifier, because all the learning cases are interpreted as the standard ones. That is why we can read from the graphs not only the difference between the splitting criteria but also the accuracy comparisons between an integrated approach and a representative of each of its parent methods: I-NN and C4.5. Surprisingly there is no difference in the shape curves between the typicality functions. It means that the overlap between classes in the investigated databases is considerable and the number of outliers is not significant. For this kind of database the exceptions are mainly placed on the borders between classes, and consequently the typicality-i function generates almost the same ordering of cases like the typicality-ii function. The typicality-i function performed slightly better than the typicality function-ii but the differences are not statistically significant (in all comparisons ANOVA was used on 0.05 level). 131

Voting 0,95 0,94 i 0,93 ~..._~._--------~ 0,92 0 25 50 75 100 o/am.~ typ.l --m--typ.u Figure 1. Experimental results on the Voting database Zoo "9~ 0,96 0 25 50 75 %N " 100 ~p.l typ.i Figure 2. Experimental results on the Zoo database C~ O 1 0,81 0,6 0,4 25 50 75 100 ~p.l ~p.ll Figure 3. Experimental results on the Crx database Monk o,g 0,95 J 0,9 J 0,85 0,8 0 25 50 75 100 typ.m --m-- typ. Figure 4. Experimental results on the Monk database 132

Led g 0,8 0,75 0,7 0,65 0,6 o 25 50 75 100 typ.i.~ ~p.ii %N Figure 5. Experimental results on the Led database As it was expected the trends in the curves are dependent on the characteristics of the database. For example in the symbol oriented database Voting, the increase of standard cases (more rules) influences the increase the classification accuracy. On other hand in the Crx database (a lot of numerical data) we can observe the completely inverse trend. It is easy to notice that in this case the instance-based approach performed significantly better than C4.5. For two databases we obtained statistically significant differences between the worst result in the integrated approach and one of its parent methods. In the Crx database the 1-NN classifier is significantly better, and in the Monk database both classifiers: 1-NN and C4.5 are better then the integrated approach. It should be underlined that during experiments C4.5 was working on the default rule when the number of standard cases was small (i.e. in the ZOO database), so this fact decreased the performance of the integrated approach. Nevertheless the results are clear and show that a significant synergy was not obtained by combining an instance-based and rule-based approach in the way presented in this paper. This not means that this way of integration is not reasonable at all, but an intensive empirical study is needed (especially on the artificial databases with a significant number of outliers) in order to understand completely a trade-off between accuracy and comprehensibility. approximation to the answer, but if the problem is judged to be compellingly similar to a known exception of the rules, then the solution is based on the exception rather than on the rules. Last but not least there is the unifying approach implemented in the RISE algorithm by Domingos (Domingos 1996). RISE solves the classification taskby intelligent search for the best mixture of selected cases and increasingly abstract rules, and finally classification is performed using a best-match strategy. 5 Conclusions The research reported here attempted to combine the instance-based and rule-based problem solving techniques in a single architecture based on the standard and exceptional cases. In contrary to our previous experience concerning a good comprehensibility of this approach, the presented empirical evaluation has not shown this attempt to be successful in terms of accuracy. The empirical results on the splitting criteria show that only a careful,,splitting" policy may give an accurate and comprehensive classifier. 4 Related Work The integration of the instance-based or more generally case-based and rule-based methods has attracted a lot of research. For this paper we summarize only three, but probably the most significant approaches. Rissland and Skalak described a system CABARET that integrates reasoning with rules and reasoning with previous cases (Rissland and Skalak 1991). This integration was performed via a collection of control heuristics. Golding and Rosenbloom propose the ANAPRON system for combining rule-based and case-based reasoning for the task of pronouncing surnames (Golding and Rosenbloom 1991, 1996). The central idea of their approach is to apply the rules to a target problem to get a first References Domingos, P. 1996. Unifying Instance-Based and Rule Based Induction. Machine Learning 24:144-168. Golding, A.R., Rosenbloom, P.S. 1991. Improving Rule Based System through Case-Based Reasoning. In Proceedings of the 19th National Conference on Artificial Intelligence., 22-27. The MIT Press. Goiding, A.R., Rosenbloom, P.S. 1996. Improving accuracy by combining rule-based and case-based reasoning. Artificial Intelligence 87:215-254. 133

Holte, R., Acker, L.E., Porter B.W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, 813-818. Morgan Kaufmann. Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo. Riesbeck, C.K., Schank, R.C. 1989. Inside Case-Based Reasoning. Lawrence Erlbaum, Hillsdale. Rissland, E.L., Skalak D.B. 1991. CABARET: rule integration in a hybrid architecture. International Journal of Man-Machine Studies, 34:839-887. Surma, J., Vanhoof, K. 1995. Integrating Rules and Cases for the Classification Task. In Proceedings of the First International Case-Based Reasoning Conference - ICCBR 95, 325-334. Springer Verlag. Surma, J., Vanhoof, K., Limere, A. 1997. Integrating Rules and Cases for Data Mining in Financial Databases. In Proceedings of the 9th International Conference on Artificial Intelligence Applications - EXPERSYS 97, 61-66. IITT- International. Zhang, J. 1992. Selecting Typical Instances in Instance- Based Learning. In Proceedings of the 9th International Conference on Machine Learning, 470-479. Morgan Kaufmann. 134