Advanced Information Processing Series Editor Lakhmi C. Jain Advisory Board Members Endre Boros Clarence W. de Silva Stephen Grossberg Robert J. Hewlett Michael N. Huhns Paul B. Kantor Charles L. Karr Nadia Magenat-Thalmann Dinesh P.Mital Toyoaki Nishida Klaus Obermayer Manfred Schmitt
Hisao Ishibuchi Tomoharu Nakashima Manabu Nii Classification and Modeling with Linguistic Information Granules Advanced Approaches to Linguistic Data Mining With 217 Figures and 72 Tables ^ Spri rineer
Hisao Ishibuchi Department of Computer Science and Intelligent Systems Osaka Prefecture University 1-1 Gakuen-cho, Sakai Osaka 599-8531, Japan email: hisaoi@cs.osakafu-u.ac.jp Tomoharu Nakashima Department of Computer Science and Intelligent Systems Osaka Prefecture University 1-1 Gakuen-cho, Sakai Osaka 599-8531, Japan email: nakashi@cs.osakafu-u.ac.jp Manabu Nii Department of Electrical Engineering and Computer Sciences Graduate School of Engineering University of Hyogo 2167Shosha, Himeji Hyogo 671-2201, Japan e-mail: nii@eng.u-hyogo.ac.jp Library of Congress Control Number: 2004114623 ACM Subject Classification (1998): 1.2 ISBN 3-540-20767-8 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in databanks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronhne.com Springer-Verlag BerHn Heidelberg 2005 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: by the Authors Cover design: KiinkelLopka, Heidelberg Production: LE-TeX Jelonek, Schmidt & Vockler GbR, Leipzig Printed on acid-free paper 45/3142/YL - 5 4 3 210
Preface Many approaches have already been proposed for classification and modeling in the literature. These approaches are usually based on mathematical models. Computer systems can easily handle mathematical models even when they are complicated and nonlinear (e.g., neural networks). On the other hand, it is not always easy for human users to intuitively understand mathematical models even when they are simple and linear. This is because human information processing is based mainly on linguistic knowledge while computer systems are designed to handle symbolic and numerical information. A large part of our daily communication is based on words. We learn from various media such as books, newspapers, magazines, TV, and the Internet through words. We also communicate with others through words. While words play a central role in human information processing, linguistic models are not often used in the fields of classification and modeling. If there is no goal other than the maximization of accuracy in classification and modeling, mathematical models may always be preferred to linguistic models. On the other hand, linguistic models may be chosen if emphasis is placed on interpretability. The main purpose in writing this book is to clearly explain how classification and modeling can be handled in a human understandable manner. In this book, we only use simple linguistic rules such as "// the 1st input is large and the 2nd input is small then the output is large^^ and "// the 1st attribute is small and the 2nd attribute is medium then the pattern is Class ^". These linguistic rules are extracted from numerical data. In this sense, our approaches to classification and modeling can be viewed as linguistic knowledge extraction from numerical data (i.e., linguistic data mining). There are many issues to be discussed in linguistic approaches to classification and modeling. The first issue is how to determine the linguistic terms used in linguistic rules. For example, we have some linguistic terms such as young, middle-aged, and old for describing our ages. In the case of weight, we might use light, middle, and heavy. Two problems are involved in the determination of linguistic terms. One is to choose linguistic terms for each variable, and the other is to define the meaning of each linguistic term. The choice of linguistic terms is related to linguistic discretization (i.e., granulation) of each variable. The definition of the meaning of each linguistic term is performed using fuzzy logic. That is, the meaning of each linguistic term is specified by its membership function. Linguistic rules can be viewed as combinations of linguistic terms for each
VI Preface variable. The main focus of this book is to find good combinations of linguistic terms for generating linguistic rules. Interpret ability as well as accuracy are taken into account when we extract linguistic rules from numerical data. Various aspects are related to the interpretability of linguistic models. In this book, the following aspects are discussed: Granulation of each variable (i.e., the number of linguistic terms). Overlap between adjacent linguistic terms. Length of each linguistic rule (i.e., the number of antecedent conditions). Number of linguistic rules. The first two aspects are related to the determination of linguistic terms. We examine the effect of these aspects on the performance of linguistic models. The other two aspects are related to the complexity of linguistic models. We examine a tradeoff between the accuracy and the complexity of linguistic models. We mainly use genetic algorithms for designing linguistic models. Genetic algorithms are used as machine learning tools as well as optimization tools. We also describe the handling of linguistic rules in neural networks. Linguistic rules and numerical data are simultaneously used as training data in the learning of neural networks. Trained neural networks are used to extract linguistic rules. While this book includes many state-of-the-art techniques in soft computing such as multi-objective genetic algorithms, genetics-based machine learning, and fuzzified neural networks, undergraduate students in computer science and related fields may be able to understand almost all parts of this book without any particular background knowledge. We make the book as simple as possible by using many examples and figures. We explain fuzzy logic, genetic algorithms, and neural networks in an easily understandable manner when they are used in the book. This book can be used as a textbook in a one-semester course. In this case, the last four chapters can be omitted because they include somewhat advanced topics on fuzzified neural networks. The first ten chapters clearly explain linguistic models for classification and modeling. I would like to thank Prof. Lakhmi C. Jain for giving me the opportunity to write this book. We would also like to thank Prof. Witold Pedrycz and Prof. Francisco Herrera for their useful comments on the draft version of this book. Special thanks are extended to people who kindly assisted us in publishing this book. For example, Mr. Ronan Nugent worked hard for the copy-editing of this book. Ms. Ulrike Strieker gave us helpful comments on the layout and production. And general comments are given by Mr. Ralf Gerstner, who patiently and kindly contacted us. Some simulation results in this book were checked by my students. It is a pleasure to acknowledge the help of Takashi Yamamoto, Gaku Nakai, Teppei Seguchi, Yohei Shibata, Masayo Udo, Shiori Kaige, and Satoshi Namba. Sakai, Osaka, March 2003 Hisao Ishibuchi
Contents 1. Linguistic Information Granules 1 1.1 Mathematical Handling of Linguistic Terms 2 1.2 Linguistic Discretization of Continuous Attributes 4 2. Pattern Classification with Linguistic Rules 11 2.1 Problem Description 11 2.2 Linguistic Rule Extraction for Classification Problems 12 2.2.1 Specification of the Consequent Class 13 2.2.2 Specification of the Rule Weight 17 2.3 Classification of New Patterns by Linguistic Rules 20 2.3.1 Single Winner-Based Method 20 2.3.2 Voting-Based Method 22 2.4 Computer Simulations 25 2.4.1 Comparison of Four Definitions of Rule Weights 26 2.4.2 Simulation Results on Iris Data 29 2.4.3 Simulation Results on Wine Data 32 2.4.4 Discussions on Simulation Results 35 3. Learning of Linguistic Rules 39 3.1 Reward-Punishment Learning 39 3.1.1 Learning Algorithm 39 3.1.2 Illustration of the Learning Algorithm Using Artificial Test Problems 41 3.1.3 Computer Simulations on Iris Data 45 3.1.4 Computer Simulations on Wine Data 47 3.2 Analytical Learning 47 3.2.1 Learning Algorithm 48 3.2.2 Illustration of the Learning Algorithm Using Artificial Test Problems 50 3.2.3 Computer Simulations on Iris Data 54 3.2.4 Computer Simulations on Wine Data 56 3.3 Related Issues 57 3.3.1 Further Adjustment of Classification Boundaries 57 3.3.2 Adjustment of Membership Functions 62
VIII Contents 4. Input Selection and Rule Selection 69 4.1 Curse of Dimensionality 69 4.2 Input Selection 70 4.2.1 Examination of Subsets of Attributes 70 4.2.2 Simulation Results 71 4.3 Genetic Algorithm-Based Rule Selection 75 4.3.1 Basic Idea 76 4.3.2 Generation of Candidate Rules 77 4.3.3 Genetic Algorithms for Rule Selection 80 4.3.4 Computer Simulations 87 4.4 Some Extensions to Rule Selection 89 4.4.1 Heuristics in Genetic Algorithms 90 4.4.2 Prescreening of Candidate Rules 93 4.4.3 Computer Simulations 96 5. Genetics-Based Machine Learning 103 5.1 Two Approaches in Genetics-Based Machine Learning 103 5.2 Michigan-Style Algorithm 105 5.2.1 Coding of Linguistic Rules 105 5.2.2 Genetic Operations 105 5.2.3 Algorithm 107 5.2.4 Computer Simulations 108 5.2.5 Extensions to the Michigan-Style Algorithm Ill 5.3 Pittsburgh-Style Algorithm 116 5.3.1 Coding of Rule Sets 117 5.3.2 Genetic Operations 117 5.3.3 Algorithm 119 5.3.4 Computer Simulations 119 5.4 Hybridization of the Two Approaches 121 5.4.1 Advantages of Each Algorithm 121 5.4.2 Hybrid Algorithm 124 5.4.3 Computer Simulations 125 5.4.4 Minimization of the Number of Linguistic Rules 126 6. Multi-Objective Design of Linguistic Models 131 6.1 Formulation of Three-Objective Problem 131 6.2 Multi-Objective Genetic Algorithms 134 6.2.1 Fitness Function 134 6.2.2 Elitist Strategy 135 6.2.3 Basic Framework of Multi-Objective Genetic Algorithms 135 6.3 Multi-Objective Rule Selection 136 6.3.1 Algorithm 136 6.3.2 Computer Simulations 136 6.4 Multi-Objective Genetics-Based Machine Learning 139 6.4.1 Algorithm 139
Contents IX 6.4.2 Computer Simulations 139 7. Comparison of Linguistic Discretization with Interval Discretization 143 7.1 Effects of Linguistic Discretization 144 7.1.1 Effect in the Rule Generation Phase 144 7.1.2 Effect in the Classification Phase 146 7.1.3 Summary of Effects of Linguistic Discretization 147 7.2 Specification of Linguistic Discretization from Interval Discretization 147 7.2.1 Specification of Fully Fuzzified Linguistic Discretization 147 7.2.2 Specification of Partially Fuzzified Linguistic Discretization 150 7.3 Comparison Using Homogeneous Discretization 151 7.3.1 Simulation Results on Iris Data 151 7.3.2 Simulation Results on Wine Data 154 7.4 Comparison Using Inhomogeneous Discretization 155 7.4.1 Entropy-Based Inhomogeneous Interval Discretization. 156 7.4.2 Simulation Results on Iris Data 157 7.4.3 Simulation Results on Wine Data 158 8. Modeling with Linguistic Rules 161 8.1 Problem Description 161 8.2 Linguistic Rule Extraction for Modeling Problems 162 8.2.1 Linguistic Association Rules for Modeling Problems.. 163 8.2.2 Specification of the Consequent Part 165 8.2.3 Other Approaches to Linguistic Rule Generations... 166 8.2.4 Estimation of Output Values by Linguistic Rules 169 8.2.5 Standard Fuzzy Reasoning 169 8.2.6 Limitations and Extensions 172 8.2.7 Non-Standard Fuzzy Reasoning Based on the Specificity of Each Linguistic Rule 174 8.3 Modeling of Nonlinear Fuzzy Functions 177 9. Design of Compact Linguistic Models 181 9.1 Single-Objective and Multi-Objective Formulations 181 9.1.1 Three Objectives in the Design of Linguistic Models.. 181 9.1.2 Handling as a Single-Objective Optimization Problem. 182 9.1.3 Handling as a Three-Objective Optimization Problem. 183 9.2 Multi-Objective Rule Selection 185 9.2.1 Candidate Rule Generation 185 9.2.2 Candidate Rule Prescreening 185 9.2.3 Three-Objective Genetic Algorithm for Rule Selection. 187 9.2.4 Simple Numerical Example 189 9.3 Fuzzy Genetics-Based Machine Learning 190
X Contents 9.3.1 Coding of Rule Sets 192 9.3.2 Three-Objective Fuzzy GBML Algorithm 192 9.3.3 Simple Numerical Example 194 9.3.4 Some Heuristic Procedures 194 9.4 Comparison of Two Schemes 196 10. Linguistic Rules with Consequent Real Numbers 199 10.1 Consequent Real Numbers 199 10.2 Local Learning of Consequent Real Numbers 201 10.2.1 Heuristic Specification Method 201 10.2.2 Incremental Learning Algorithm 203 10.3 Global Learning 205 10.3.1 Incremental Learning Algorithm 206 10.3.2 Comparison Between Two Learning Schemes 207 10.4 Effect of the Use of Consequent Real Numbers 208 10.4.1 Resolution of Adjustment 208 10.4.2 Simulation Results 210 10.5 Twin-Table Approach 211 10.5.1 Basic Idea 212 10.5.2 Determination of Consequent Linguistic Terms 213 10.5.3 Numerical Example 215 11. Handling of Linguistic Rules in Neural Networks 219 11.1 Problem Formulation 220 11.1.1 Approximation of Linguistic Rules 220 11.1.2 Multi-Layer Feedforward Neural Networks 221 11.2 Handling of Linguistic Rules Using Membership Values 222 11.2.1 Basic Idea 222 11.2.2 Network Architecture 223 11.2.3 Computer Simulation 223 11.3 Handling of Linguistic Rules Using Level Sets 225 11.3.1 Basic Idea 225 11.3.2 Network Architecture 226 11.3.3 Computer Simulation 226 11.4 Handling of Linguistic Rules Using Fuzzy Arithmetic 228 11.4.1 Basic Idea 228 11.4.2 Fuzzy Arithmetic 228 11.4.3 Network Architecture 230 11.4.4 Computer Simulation 233 12. Learning of Neural Networks from Linguistic Rules 235 12.1 Back-Propagation Algorithm 235 12.2 Learning from Linguistic Rules for Classification Problems... 237 12.2.1 Linguistic Training Data 237 12.2.2 Cost Function 237
Contents XI 12.2.3 Extended Back-Propagation Algorithm 238 12.2.4 Learning from Linguistic Rules and Numerical Data.. 241 12.3 Learning from Linguistic Rules for Modeling Problems 245 12.3.1 Linguistic Data 245 12.3.2 Cost Function 245 12.3.3 Extended Back-Propagation Algorithm 246 12.3.4 Learning from Linguistic Rules and Numerical Data.. 247 13. Linguistic Rule Extraction from Neural Networks 251 13.1 Neural Networks and Linguistic Rules 252 13.2 Linguistic Rule Extraction for Modeling Problems 252 13.2.1 Basic Idea 253 13.2.2 Extraction of Linguistic Rules 253 13.2.3 Computer Simulations 254 13.3 Linguistic Rule Extraction for Classification Problems 258 13.3.1 Basic Idea 259 13.3.2 Extraction of Linguistic Rules 259 13.3.3 Computer Simulations 263 13.3.4 Rule Extraction Algorithm 265 13.3.5 Decreasing the Measurement Cost 267 13.4 Difficulties and Extensions 270 13.4.1 Scalability to High-Dimensional Problems 271 13.4.2 Increase of Excess Fuzziness in Fuzzy Outputs 271 14. Modeling of Fuzzy Input-Output Relations 277 14.1 Modeling of Fuzzy Number-Valued Functions 277 14.1.1 Linear Fuzzy Regression Models 278 14.1.2 Fuzzy Rule-Based Systems 280 14.1.3 Fuzzified Takagi-Sugeno Models 281 14.1.4 Fuzzified Neural Networks 283 14.2 Modeling of Fuzzy Mappings 285 14.2.1 Linear Fuzzy Regression Models 285 14.2.2 Fuzzy Rule-Based Systems 286 14.2.3 Fuzzified Takagi-Sugeno Models 286 14.2.4 Fuzzified Neural Networks 287 14.3 Fuzzy Classification 287 14.3.1 Fuzzy Classification of Non-Fuzzy Patterns 288 14.3.2 Fuzzy Classification of Interval Patterns 291 14.3.3 Fuzzy Classification of Fuzzy Patterns 291 14.3.4 Effect of Fuzzification of Input Patterns 292 Index 304