SHOUHONG WANG. A Thesis. Submitted to the School of Graduate Studies. in Partial Fulfilment of the Requirements. for the Degree. Doctor of Philosophy

NEURAL NETWORK TECHJlIQUES IN MANAGERIAL PATTERN RECOl.""NITION By SHOUHONG WANG A Thesis Submitted to the School of Graduate Studies in Partial Fulfilment of the Requirements for the Degree Doctor of Philosophy McMaster University (c) Copyright by Shouhong Wang, June 1990

NEURAL NETWORK TECHNIQUES IN KAHAGERIAL PA1TERN RECOGNITION

DOCTOR OF PHILOSOPHY (1990) McMASTER UNIVERSITY (Management Science / Systems) Hamilton, Ontario TITLE: Neural Network Techniques in Managerial Pattern Recognition AUTHOR: Shouhong Wang, B.Eng. MBA (Tsinghua University, China) (Tsinghua University, China) SUPERVISOR: Dr. Norman P. Archer NUMBER OF PAGES: xiii, 228 ii

ABSTRACl lhe management area includes a large class of pattern recognition (classification) problems. Traditionally, these problems have been solved by using statistical methods or expert system~. In practice, however, statistical assumptions about the probability distributions of the pattern variables are often not verifiable, and expertise concerning th~ correct classification is often not explicitly available. These obstacles may make statistical methods and expert system techniques difficult to apply. Since the early 1980s neural network techniques have been widely used in pattern recognition, especially after Rumelhart's back propagation learning algorithm was ~dapted to the solution of these problems. The standard neural network, using the back propagation learning algorithm, requires no statistical assumptions but uses training sample data to generate classification boundaries, allowing it to perform pattern recognition. In this dissertation the neural ~etwork's behavior in classification boundary generation is analyzed. Based on this analysis, three models are developed. The first model improves the classification performance of neural networks in managerial pattern recognition by modifying the training algorithm through the use of monoton~city. Using simulated and real data, the developed model is tested anrl verified. The second model solves bias problems caused by small sample size in neural network classification results. The third model develops multi-architecture neural networks to supply iii

1 decision makers with more natural pattern recognition information, based on fuzzy theory. iv

This dissertation owes its existence to many people whose help I deeply appreciate and ~ould like to acknowledge here. I first would like to thank my supervisor, Dr. N. P. Archer for his invaluable advice and guidance throughout this research. His help has gone beyond the call of supervisory duties. I shall always be grateful to him for his patience, and his time and energy spent wor~ing with me. I also wish to thank Dr. G. W. Torrance and Dr. D. P. Taylor for serving on my supervisory committee. More generally I would like to thank other people at McMaster University who have assisted me in completing my graduate studies, especially including Dr. W. G. Truscott, Dr. M. W. L. Chan, Dr. G. O. Wesolowsky, and Dr. M. Parlar. Finally, my sincere appreciation and love go to my parents, my wife, and my son for their support and sacrifices while I have been apart from them during my years of graduate studies. v

Notation TABLE OF CONTENTS Chapter One: Introduction............ 1.1. General Description of Neural Networks.. 1.2. Background. 1.2.1. A Class of Problems in Manag~ria1 Decision 1.2.2. Making - Pat:ern Recognition / Problem Definitions Classification. 1.2.2.1. 1.2.2.2. Managerial Pattern Recognition Characteristics of Cla$sification Problems............ 1.3. Pattern Recognition Techniques. 1.3.1. Review of Pattern Recognition Techniques 1.3.1. 1. 1.3.1.2. Bayes Rule...... Discriminant Functions. 1.3.1.3. Sequential Classification. 1.3.1.4. 1.3.1.5. Hierarchical Classification Expert System. 1.3.1.5.1. Production Rules Systems for Pattern Recognition.... 1.3.1.5.2. Frame-Based Systems for Pattern Rec\Jgnition. 1.3.1.5.3. Properties of Expert Systems in Pattern Recognition. 1.3.1.6. Neural Network Techniques. 1.3.2. Comparison of Classification Techniques 1.3.3. Primary Comparison of Discriminant Analysis and Neural Networks. 1.3.3.1. 1.3.3.2. Limitations of Discriminant Analysis An Example for Comparison of Discriminant Analysis and Neural Network Page ix 1 1 5 5 9 9 10 16 16 16 17 19 21 23 23 25 26 26 34 36 36 38 Chapter Two : The Neural Network Model 2.1. Overview. 2.2. Computational Network Elements 2.3. Learning. 2.3.1. The Learning Paradigm and The Hecbian Rule 2.3.2. Back-Propagation Least Mean Square Error Learning Algorithm. 2.3.3. Issues Relevant to Learning 2.3.3.1. Learning Rate and Momentum 2.3.3.2. Convergence 2.3.3.3. Global Minima vs. Local Minima 2.3.3.4. Symmetry Breaking. 2.4. Current Status of Neural Network Applications 2.4.1. Acceptance Criteria for The Neural Network Model.......... 2.4.2. Ne~ra1 Network Development Cycle 43 44 48 51 51 53 58 58 58 59 60 61 61 62 Chapt~r Three : Classification Boundaries in Neural Networks 65 3.1. Boundary Generation... 66 3.2. Discussion of Boundary Behavior 73 vi

3.2.1. Special Cases.... 74 3.2.1.1. Positive vi's 74 3.2.1.2. UiX s ' tends to -GO 74 3.2.1.3. U.X' tends to +<0 75 1. s 3.2.1.4. Sma11~. small UiX'. and moderate vi's 76 3.2.2. Error Feedback 77 3.3. Neural Network Yeight Mat=ices Are Not Statistic of Sample Random Variable X 81 3.4. Unpredictability of Neural Network Classification 3.4.1. Unpredictable Boundary Generation Behavior 81 82 3.4.2. 3.4.3. Do&~atic Learning Mechanism....... Prospective Methods for Controlling Neural 83 Net~ork Unpredictability... 85 3.5. Prior Knowledge about Managerial Pattern Recognition Problems 86 3.5.1. Utility Functions and Monotonicity 86 3.5.2. A Proposed Heuristic for Improving The Neural Network BPLMS Training Algorithm 90 3.6. The Monotonic Function Neural Network Model 3.6.1. Monotonic Condition for Neural Networks 97 97 3.6.2. Proper Training Data Set 3.7. Example Applications of The Monotonic Function 100 Neural Network Model... 104 3.7.1. 3.7.2. An Example of Simulated Data An Example of Real Data 104 106 3.8. Discussion 109 3.8.1. 3.8.2. Efficiency of The MF Model Robustness of The MF Model 109 111 Chapter Four : Classification Bias in The Neural Networks 4.1. and An Approach to Reduce Its Effect Monotonically Separable Problem 112 112 4.2. Learning Bias 114 4.3. Effect of Learning Bias......... 116 4.4. A Type of Ambiguity in Managerial Pattern Recognition............... 121 4.5. A Model to Reduce The Effect of Learning Bias 124 4.5.1. Monotonically Separable Problem Model 124 4.5.2. The Integrated Neural Network to Supply Assurance Information......... 128 4.6. The Relationship Between The Monotonic Function (MF) Model and The Monotonically Separable Problem (MSP) Model 131 4.7. Minimum Number of Hidden Nodes 134 Chapter Five : Fuzzy Set Representations of Neural Network 5.1. Classification Boundaries... A Problem in Statistical Classification 138 139 5.2. 5.3. Fuzzy Set Concepts The Fuzzy Set Model 143 148 5.3.1. A-Complementation in Two Class Classification.. 14H vii

5.3.2. Fuzzy Representation of the Typical Neural Network.............. 154 5.3.3. The Fuzzy Membership Model... 155 5.3.4. An Example Application of the Fuzzy Membership Model.......... 158 5.4. Discussion 162 5.4.1. Practical Fuzzy Membership Function Values 162 5.4.2. The Relationship between MFM, MSPS and FMM 164 Chapter Six : Generalization to More Than Two Classes 165 6.1. Classifiers for k>2 Classification 166 6.1.1. Bayes Rule 166 6.1.2. Fisher's Approach.... 167 6.1.3. Linear Function Classifier 168 6.1.4. Other Classifiers.... 170 6.2. A Possible Extension of The k-2 MF Model 171 6.3. The GenerallzE!d MF Model..... 176 6.3.1. Typical Neural Network Topologies for k>2 Classification 176 6.3.2. A General Algorithm for Neural Networks in k>2 Classification... 177 6.4. An Example of k>2 Classification 180 6.5. Discussion........... 181 Chapter Seven : Conclusions and Discussion 183 7.1. General Conclusions 183 7.2. Remarks.... 186 7.3. Future Research 187 Appendix I. Generating Random Sample Appendix II. Back Propagation Algorithm Appendix I II. Monotonic Condition Appendix IV. An Experiment on Green's Data Appendix V. An Experiment on Fisher's Data References [Green 1978] [Fisher 1936] 189 191 199 203 214 218 viii

NOTATION B B(X) b. 1 C C S C o d a E e F f fz(x) G g h i j k L M m n net a x discriminane funceicn coefficiene vector in Fisher method discriminant function of observation X coefficient of xi in discriminant function set of goal classes goal class, ctc class to which s belongs number of Fisher discriminant functions for more than two group case desired output of neural network node a error measured in learning pcwp.r of the test in sequential classification activation function in neural network distribution function fuzzy function of X number of clusters in the MF model algorithm subscript the number of hidden nodes in two-layer neural network subscript subscript number of goal classes in the set of classes C output of a hidden node in Kolmogorov's theorem momentum dimension of pattern vector dimension of feature vector net input for node a output value of node a output value of node a after learning sample s probability testing data set subscript training data set training data sample point temperature time weight vector of the inputs of i-th hidden node in two layer neural network model constant in Kolmogorov's theorem weight value of the output of i-th hidden node in two layer and single output dim~nsion neural network weight matrix between hidden layer and output layer weight matrix between input layer and hidden layer weight on the ith input to perceptron weight on the connection from node P to Q in neural network pattern vector ix

ith component of X feature vector ith component of Y Q fj e 1 general expression of the neural network node general expression of the neural ne~work node preceding to node Q impurity function of r Fisher ratio error signal in neural network learning algorithm constant in Kolreogorav's theorem classification score (from linear discriminant analysis) learning rate threshold in F one-di.mensional feature vector value for T subscript parameter of ~-complementation in fuzzy set mean of a random variable sample size of S transfer function in Ko~mogorov's theorem small number for the training scheme model subscript covariance matrix of multivariate variable covariance tree node in hiernrchical classifier desired value of the feature vector of s cumulative norm~l distribution function mapping function from X to Y transfer function in Kolmogorov's theorem transfer function in Kolmogorov's theorem pattern space angle between pattern vector X and its i-th component xi error tolerance in neural network learning algorithm split in hierarchical classifier x

Figure 1.1. Figure 1.2. Figure 1.3. Figure 1.4. Figure 1.5. Figure 1. 6. Figure 1.7. Figure 1.8. Figure 1.9. Figure 1.10. Figure 1.11. Figure 1.12. Figure 1.13. Figure 1.14. Figure 1.15. Figure 1.16. Figure 1.17. Figure 2.1. Figure 2.2. Figure 2.3. Figure 2.4. Figure 2.5. Figure 2.6. Figure 2.7. Figure 3.1. Figure 3.2. Figure 3.3. Figure 3.4. Figure 3.5. Figure 3.6. Figure 3.7. Figure 3.8. Figure 3.9. Figure 3. 10. Figure 3.11. Figure 4.1. Figure 4.2. Figure 4.3. Figure 4.4. Figure 4.5. Figure 4.6. Figure 4.7. Figure 4.8. Figure 4.9. Figure 5.1. FIGURES General Neural Network Model Stock Mean-Variance Chart Pattern Recognition Machine Decision Regions and Decision Boundary Hierarchical Classifier A Split of The Classification Tree Production Rule System for Pattern Recognition Frame Representation of Pattern Recognition Layered Neural Network Model Perceptron Model Linear Decision Boundary XOR (Exclusive OR) Problem A Multi-layer Neural Network Can Generate Arbitrarily Complex Decision Regions Synthesis Problem The Position of Various Classification Techniques in The "Glass Box - Black Box" Spectrum Coordinate Transformation Neural Network Results for the Exponential Boundary Example Neural Network ~ith One Hidden Layer Computational Elements in a Neu~al Network Activation Functions Sigmoid Logic Hebbian Learning Rule Global Minima, Local Minima, and Possible Solutions Neural Network Prototype Development Neural Network Hypercube Two Layer, Single Output Neural Network Boundary Movement Error Feedback Signal Example of Boundary Classification Results Complex Boundaries Violate Monotonicity Clusters of Misclassified Points Vector Analysis of Clusters of Misclassified Points Vector Analysis of toe Trade-off Algorithm Main Steps of the Algorithm to Find a Proper Training Data Set Experimental Result For The Monotonic Function Neural Network Classifier A Monotonically Separable Classification Problem Effect of Small Learning Rate ~ Frontiers of Decision Consequences Monotonically Separable Problem Model Determining the Frontiers of Sample Sets An Integrated Neural Network to Supply Assurance Information "Unbiased" Boundary Example of Uncertain Boundary Generation Experimental Result With Combined MF Linear Discriminant Analysis xi and MSP Models Page 2 5 13 21 22 24 25 27 27 28 29 29 34 36 39 41 45 49 49 Sl 53 60 63 67 68 72 80 83 89 92 93 95 103 105 113 119 123 125 127 129 129 132 133 140

Figure 5.2. Figure 5.3. Figure 5.4. Figure 5.5. Figure 5.6. Figure 6.1. Figure 6.2. Figure 6.3. Figure 6.4. Figure 6.5. Figure 6.6. Figure 6.7. Figure 6.8. Figure 6.9. Figure A.l. Graphs Representing Fuzzy Relationships ~-ComplementationRelationship Fuzzy Membership Functions in the Two Class Classification Case Fuzzy Membership Functions Implemented In Conjunction With Neural Network Classification Fuzzy Membership Functions for the Example in Section 3.7.1. Linear Functions in the k>2 Case (Scheme 1) Linear Functions in che k>2 Case (Scheme 2) Linear Functions in the k>2 Case (Scheme 3) Split Complex Decision Region Into Subclasses Minimum Distance Classifier in Piecewise Learning Machine Decision Region Complexity Spectrum Decomposition of a Pattern Dimension Two Typical Neural Network Topologies Classifying a New Observation Falling in an Undecided Region m The Behavior of R as a Function of i L wi x r-o rr 144 147 149 154 161 168 169 169 171 171 173 175 177 180 201 xii

TABLES Table 1.1. Table 1.2. Table 3.1. Table 3.2. Table 5.1. Page Comparison of Classification Techniques 33 Discriminant Analysis for the Exponential Boundary Example 40 Example of the Heuristic Approach to Obtain a Proper Training Data Set 91 A Comparison of the Percentage Correctly Classified by the MF Model and the LOA Method on the Alpha TV Commercial Study Data [Green 1978] 108 A Fuzzy Relationship Between Assets and Creditworthiness 143 xiii

CHAPTER ONE INTRODUCTION 1.1. GENERAL DESCRIPTION OF NEURAL NElVORXS For our purposes, artificial neural networks are defined as massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations, which are intended to react to information o~ the objects of the real world in a manner analogous to biological nervous systems [Kohonen 1987). Neural networks or simply "neural nets" may also be referred to as connectionist models, parallel distributed processing models, and neuromorphic systems [Lippmann 1987J. Neural network architecture may be described in various ways, depending upon its desired function [Lippmann 1987, Hecht Nielsen 1987}. The most general topology of a neural network is shown in Figure 1.1. The neural network system carries ou~ the information processing operation as a mathematical mapping ~ of vector X to vector Y so that Y- (X), where X is the vector of external inputs to the network, and Y is the vector of outputs. Units ("neurons") within the network may receive input signals and/or lateral feedbacks 1