SHOUHONG WANG. A Thesis. Submitted to the School of Graduate Studies. in Partial Fulfilment of the Requirements. for the Degree. Doctor of Philosophy

Similar documents
INPE São José dos Campos

Python Machine Learning

Knowledge-Based - Systems

Learning Methods for Fuzzy Systems

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lecture 1: Machine Learning Basics

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Artificial Neural Networks written examination

Softprop: Softmax Neural Network Backpropagation Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Evolutive Neural Net Fuzzy Filtering: Basic Description

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Probability and Statistics Curriculum Pacing Guide

Knowledge management styles and performance: a knowledge space model from both theoretical and empirical perspectives

CHALLENGES FACING DEVELOPMENT OF STRATEGIC PLANS IN PUBLIC SECONDARY SCHOOLS IN MWINGI CENTRAL DISTRICT, KENYA

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Basic Concepts of Machine Learning

Human Emotion Recognition From Speech

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Artificial Neural Networks

Seminar - Organic Computing

An OO Framework for building Intelligence and Learning properties in Software Agents

Evolution of Symbolisation in Chimpanzees and Neural Nets

Learning to Schedule Straight-Line Code

(Sub)Gradient Descent

Self Study Report Computer Science

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

A Reinforcement Learning Variant for Control Scheduling

Henry Tirri* Petri Myllymgki

SARDNET: A Self-Organizing Feature Map for Sequences

WHEN THERE IS A mismatch between the acoustic

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Speaker Identification by Comparison of Smart Methods. Abstract

Axiom 2013 Team Description Paper

Australian Journal of Basic and Applied Sciences

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Statewide Framework Document for:

THE INFLUENCE OF COOPERATIVE WRITING TECHNIQUE TO TEACH WRITING SKILL VIEWED FROM STUDENTS CREATIVITY

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Abstractions and the Brain

Assignment 1: Predicting Amazon Review Ratings

Modeling function word errors in DNN-HMM based LVCSR systems

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Time series prediction

Rule Learning With Negation: Issues Regarding Effectiveness

Modeling function word errors in DNN-HMM based LVCSR systems

Radius STEM Readiness TM

Test Effort Estimation Using Neural Network

An empirical study of learning speed in backpropagation

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

On the Formation of Phoneme Categories in DNN Acoustic Models

Issues in the Mining of Heart Failure Datasets

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

GACE Computer Science Assessment Test at a Glance

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Reinforcement Learning by Comparing Immediate Reward

ASSESSMENT TASK OVERVIEW & PURPOSE:

ICTCM 28th International Conference on Technology in Collegiate Mathematics

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Mathematics. Mathematics

Grade 6: Correlated to AGS Basic Math Skills

Speech Recognition at ICSI: Broadcast News and beyond

A study of speaker adaptation for DNN-based speech synthesis

A THESIS. By: IRENE BRAINNITA OKTARIN S

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probabilistic Latent Semantic Analysis

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Knowledge Transfer in Deep Convolutional Neural Nets

Mathematics subject curriculum

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Rule Learning with Negation: Issues Regarding Effectiveness

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Moderator: Gary Weckman Ohio University USA

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Guide to Teaching Computer Science

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Classification Using ANN: A Review

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Discriminative Learning of Beam-Search Heuristics for Planning

CSL465/603 - Machine Learning

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Analysis of Enzyme Kinetic Data

Ph.D. in Behavior Analysis Ph.d. i atferdsanalyse

A Case Study: News Classification Based on Term Frequency

Perspectives of Information Systems

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Economics 201 Principles of Microeconomics Fall 2010 MWF 10:00 10:50am 160 Bryan Building

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Using focal point learning to improve human machine tacit coordination

Teaching a Laboratory Section

Transcription:

NEURAL NETWORK TECHJlIQUES IN MANAGERIAL PATTERN RECOl.""NITION By SHOUHONG WANG A Thesis Submitted to the School of Graduate Studies in Partial Fulfilment of the Requirements for the Degree Doctor of Philosophy McMaster University (c) Copyright by Shouhong Wang, June 1990

NEURAL NETWORK TECHNIQUES IN KAHAGERIAL PA1TERN RECOGNITION

DOCTOR OF PHILOSOPHY (1990) McMASTER UNIVERSITY (Management Science / Systems) Hamilton, Ontario TITLE: Neural Network Techniques in Managerial Pattern Recognition AUTHOR: Shouhong Wang, B.Eng. MBA (Tsinghua University, China) (Tsinghua University, China) SUPERVISOR: Dr. Norman P. Archer NUMBER OF PAGES: xiii, 228 ii

ABSTRACl lhe management area includes a large class of pattern recognition (classification) problems. Traditionally, these problems have been solved by using statistical methods or expert system~. In practice, however, statistical assumptions about the probability distributions of the pattern variables are often not verifiable, and expertise concerning th~ correct classification is often not explicitly available. These obstacles may make statistical methods and expert system techniques difficult to apply. Since the early 1980s neural network techniques have been widely used in pattern recognition, especially after Rumelhart's back propagation learning algorithm was ~dapted to the solution of these problems. The standard neural network, using the back propagation learning algorithm, requires no statistical assumptions but uses training sample data to generate classification boundaries, allowing it to perform pattern recognition. In this dissertation the neural ~etwork's behavior in classification boundary generation is analyzed. Based on this analysis, three models are developed. The first model improves the classification performance of neural networks in managerial pattern recognition by modifying the training algorithm through the use of monoton~city. Using simulated and real data, the developed model is tested anrl verified. The second model solves bias problems caused by small sample size in neural network classification results. The third model develops multi-architecture neural networks to supply iii

1 decision makers with more natural pattern recognition information, based on fuzzy theory. iv

This dissertation owes its existence to many people whose help I deeply appreciate and ~ould like to acknowledge here. I first would like to thank my supervisor, Dr. N. P. Archer for his invaluable advice and guidance throughout this research. His help has gone beyond the call of supervisory duties. I shall always be grateful to him for his patience, and his time and energy spent wor~ing with me. I also wish to thank Dr. G. W. Torrance and Dr. D. P. Taylor for serving on my supervisory committee. More generally I would like to thank other people at McMaster University who have assisted me in completing my graduate studies, especially including Dr. W. G. Truscott, Dr. M. W. L. Chan, Dr. G. O. Wesolowsky, and Dr. M. Parlar. Finally, my sincere appreciation and love go to my parents, my wife, and my son for their support and sacrifices while I have been apart from them during my years of graduate studies. v

Notation TABLE OF CONTENTS Chapter One: Introduction............ 1.1. General Description of Neural Networks.. 1.2. Background. 1.2.1. A Class of Problems in Manag~ria1 Decision 1.2.2. Making - Pat:ern Recognition / Problem Definitions Classification. 1.2.2.1. 1.2.2.2. Managerial Pattern Recognition Characteristics of Cla$sification Problems............ 1.3. Pattern Recognition Techniques. 1.3.1. Review of Pattern Recognition Techniques 1.3.1. 1. 1.3.1.2. Bayes Rule...... Discriminant Functions. 1.3.1.3. Sequential Classification. 1.3.1.4. 1.3.1.5. Hierarchical Classification Expert System. 1.3.1.5.1. Production Rules Systems for Pattern Recognition.... 1.3.1.5.2. Frame-Based Systems for Pattern Rec\Jgnition. 1.3.1.5.3. Properties of Expert Systems in Pattern Recognition. 1.3.1.6. Neural Network Techniques. 1.3.2. Comparison of Classification Techniques 1.3.3. Primary Comparison of Discriminant Analysis and Neural Networks. 1.3.3.1. 1.3.3.2. Limitations of Discriminant Analysis An Example for Comparison of Discriminant Analysis and Neural Network Page ix 1 1 5 5 9 9 10 16 16 16 17 19 21 23 23 25 26 26 34 36 36 38 Chapter Two : The Neural Network Model 2.1. Overview. 2.2. Computational Network Elements 2.3. Learning. 2.3.1. The Learning Paradigm and The Hecbian Rule 2.3.2. Back-Propagation Least Mean Square Error Learning Algorithm. 2.3.3. Issues Relevant to Learning 2.3.3.1. Learning Rate and Momentum 2.3.3.2. Convergence 2.3.3.3. Global Minima vs. Local Minima 2.3.3.4. Symmetry Breaking. 2.4. Current Status of Neural Network Applications 2.4.1. Acceptance Criteria for The Neural Network Model.......... 2.4.2. Ne~ra1 Network Development Cycle 43 44 48 51 51 53 58 58 58 59 60 61 61 62 Chapt~r Three : Classification Boundaries in Neural Networks 65 3.1. Boundary Generation... 66 3.2. Discussion of Boundary Behavior 73 vi

3.2.1. Special Cases.... 74 3.2.1.1. Positive vi's 74 3.2.1.2. UiX s ' tends to -GO 74 3.2.1.3. U.X' tends to +<0 75 1. s 3.2.1.4. Sma11~. small UiX'. and moderate vi's 76 3.2.2. Error Feedback 77 3.3. Neural Network Yeight Mat=ices Are Not Statistic of Sample Random Variable X 81 3.4. Unpredictability of Neural Network Classification 3.4.1. Unpredictable Boundary Generation Behavior 81 82 3.4.2. 3.4.3. Do&~atic Learning Mechanism....... Prospective Methods for Controlling Neural 83 Net~ork Unpredictability... 85 3.5. Prior Knowledge about Managerial Pattern Recognition Problems 86 3.5.1. Utility Functions and Monotonicity 86 3.5.2. A Proposed Heuristic for Improving The Neural Network BPLMS Training Algorithm 90 3.6. The Monotonic Function Neural Network Model 3.6.1. Monotonic Condition for Neural Networks 97 97 3.6.2. Proper Training Data Set 3.7. Example Applications of The Monotonic Function 100 Neural Network Model... 104 3.7.1. 3.7.2. An Example of Simulated Data An Example of Real Data 104 106 3.8. Discussion 109 3.8.1. 3.8.2. Efficiency of The MF Model Robustness of The MF Model 109 111 Chapter Four : Classification Bias in The Neural Networks 4.1. and An Approach to Reduce Its Effect Monotonically Separable Problem 112 112 4.2. Learning Bias 114 4.3. Effect of Learning Bias......... 116 4.4. A Type of Ambiguity in Managerial Pattern Recognition............... 121 4.5. A Model to Reduce The Effect of Learning Bias 124 4.5.1. Monotonically Separable Problem Model 124 4.5.2. The Integrated Neural Network to Supply Assurance Information......... 128 4.6. The Relationship Between The Monotonic Function (MF) Model and The Monotonically Separable Problem (MSP) Model 131 4.7. Minimum Number of Hidden Nodes 134 Chapter Five : Fuzzy Set Representations of Neural Network 5.1. Classification Boundaries... A Problem in Statistical Classification 138 139 5.2. 5.3. Fuzzy Set Concepts The Fuzzy Set Model 143 148 5.3.1. A-Complementation in Two Class Classification.. 14H vii

5.3.2. Fuzzy Representation of the Typical Neural Network.............. 154 5.3.3. The Fuzzy Membership Model... 155 5.3.4. An Example Application of the Fuzzy Membership Model.......... 158 5.4. Discussion 162 5.4.1. Practical Fuzzy Membership Function Values 162 5.4.2. The Relationship between MFM, MSPS and FMM 164 Chapter Six : Generalization to More Than Two Classes 165 6.1. Classifiers for k>2 Classification 166 6.1.1. Bayes Rule 166 6.1.2. Fisher's Approach.... 167 6.1.3. Linear Function Classifier 168 6.1.4. Other Classifiers.... 170 6.2. A Possible Extension of The k-2 MF Model 171 6.3. The GenerallzE!d MF Model..... 176 6.3.1. Typical Neural Network Topologies for k>2 Classification 176 6.3.2. A General Algorithm for Neural Networks in k>2 Classification... 177 6.4. An Example of k>2 Classification 180 6.5. Discussion........... 181 Chapter Seven : Conclusions and Discussion 183 7.1. General Conclusions 183 7.2. Remarks.... 186 7.3. Future Research 187 Appendix I. Generating Random Sample Appendix II. Back Propagation Algorithm Appendix I II. Monotonic Condition Appendix IV. An Experiment on Green's Data Appendix V. An Experiment on Fisher's Data References [Green 1978] [Fisher 1936] 189 191 199 203 214 218 viii

NOTATION B B(X) b. 1 C C S C o d a E e F f fz(x) G g h i j k L M m n net a x discriminane funceicn coefficiene vector in Fisher method discriminant function of observation X coefficient of xi in discriminant function set of goal classes goal class, ctc class to which s belongs number of Fisher discriminant functions for more than two group case desired output of neural network node a error measured in learning pcwp.r of the test in sequential classification activation function in neural network distribution function fuzzy function of X number of clusters in the MF model algorithm subscript the number of hidden nodes in two-layer neural network subscript subscript number of goal classes in the set of classes C output of a hidden node in Kolmogorov's theorem momentum dimension of pattern vector dimension of feature vector net input for node a output value of node a output value of node a after learning sample s probability testing data set subscript training data set training data sample point temperature time weight vector of the inputs of i-th hidden node in two layer neural network model constant in Kolmogorov's theorem weight value of the output of i-th hidden node in two layer and single output dim~nsion neural network weight matrix between hidden layer and output layer weight matrix between input layer and hidden layer weight on the ith input to perceptron weight on the connection from node P to Q in neural network pattern vector ix

ith component of X feature vector ith component of Y Q fj e 1 general expression of the neural network node general expression of the neural ne~work node preceding to node Q impurity function of r Fisher ratio error signal in neural network learning algorithm constant in Kolreogorav's theorem classification score (from linear discriminant analysis) learning rate threshold in F one-di.mensional feature vector value for T subscript parameter of ~-complementation in fuzzy set mean of a random variable sample size of S transfer function in Ko~mogorov's theorem small number for the training scheme model subscript covariance matrix of multivariate variable covariance tree node in hiernrchical classifier desired value of the feature vector of s cumulative norm~l distribution function mapping function from X to Y transfer function in Kolmogorov's theorem transfer function in Kolmogorov's theorem pattern space angle between pattern vector X and its i-th component xi error tolerance in neural network learning algorithm split in hierarchical classifier x

Figure 1.1. Figure 1.2. Figure 1.3. Figure 1.4. Figure 1.5. Figure 1. 6. Figure 1.7. Figure 1.8. Figure 1.9. Figure 1.10. Figure 1.11. Figure 1.12. Figure 1.13. Figure 1.14. Figure 1.15. Figure 1.16. Figure 1.17. Figure 2.1. Figure 2.2. Figure 2.3. Figure 2.4. Figure 2.5. Figure 2.6. Figure 2.7. Figure 3.1. Figure 3.2. Figure 3.3. Figure 3.4. Figure 3.5. Figure 3.6. Figure 3.7. Figure 3.8. Figure 3.9. Figure 3. 10. Figure 3.11. Figure 4.1. Figure 4.2. Figure 4.3. Figure 4.4. Figure 4.5. Figure 4.6. Figure 4.7. Figure 4.8. Figure 4.9. Figure 5.1. FIGURES General Neural Network Model Stock Mean-Variance Chart Pattern Recognition Machine Decision Regions and Decision Boundary Hierarchical Classifier A Split of The Classification Tree Production Rule System for Pattern Recognition Frame Representation of Pattern Recognition Layered Neural Network Model Perceptron Model Linear Decision Boundary XOR (Exclusive OR) Problem A Multi-layer Neural Network Can Generate Arbitrarily Complex Decision Regions Synthesis Problem The Position of Various Classification Techniques in The "Glass Box - Black Box" Spectrum Coordinate Transformation Neural Network Results for the Exponential Boundary Example Neural Network ~ith One Hidden Layer Computational Elements in a Neu~al Network Activation Functions Sigmoid Logic Hebbian Learning Rule Global Minima, Local Minima, and Possible Solutions Neural Network Prototype Development Neural Network Hypercube Two Layer, Single Output Neural Network Boundary Movement Error Feedback Signal Example of Boundary Classification Results Complex Boundaries Violate Monotonicity Clusters of Misclassified Points Vector Analysis of Clusters of Misclassified Points Vector Analysis of toe Trade-off Algorithm Main Steps of the Algorithm to Find a Proper Training Data Set Experimental Result For The Monotonic Function Neural Network Classifier A Monotonically Separable Classification Problem Effect of Small Learning Rate ~ Frontiers of Decision Consequences Monotonically Separable Problem Model Determining the Frontiers of Sample Sets An Integrated Neural Network to Supply Assurance Information "Unbiased" Boundary Example of Uncertain Boundary Generation Experimental Result With Combined MF Linear Discriminant Analysis xi and MSP Models Page 2 5 13 21 22 24 25 27 27 28 29 29 34 36 39 41 45 49 49 Sl 53 60 63 67 68 72 80 83 89 92 93 95 103 105 113 119 123 125 127 129 129 132 133 140

Figure 5.2. Figure 5.3. Figure 5.4. Figure 5.5. Figure 5.6. Figure 6.1. Figure 6.2. Figure 6.3. Figure 6.4. Figure 6.5. Figure 6.6. Figure 6.7. Figure 6.8. Figure 6.9. Figure A.l. Graphs Representing Fuzzy Relationships ~-ComplementationRelationship Fuzzy Membership Functions in the Two Class Classification Case Fuzzy Membership Functions Implemented In Conjunction With Neural Network Classification Fuzzy Membership Functions for the Example in Section 3.7.1. Linear Functions in the k>2 Case (Scheme 1) Linear Functions in che k>2 Case (Scheme 2) Linear Functions in the k>2 Case (Scheme 3) Split Complex Decision Region Into Subclasses Minimum Distance Classifier in Piecewise Learning Machine Decision Region Complexity Spectrum Decomposition of a Pattern Dimension Two Typical Neural Network Topologies Classifying a New Observation Falling in an Undecided Region m The Behavior of R as a Function of i L wi x r-o rr 144 147 149 154 161 168 169 169 171 171 173 175 177 180 201 xii

TABLES Table 1.1. Table 1.2. Table 3.1. Table 3.2. Table 5.1. Page Comparison of Classification Techniques 33 Discriminant Analysis for the Exponential Boundary Example 40 Example of the Heuristic Approach to Obtain a Proper Training Data Set 91 A Comparison of the Percentage Correctly Classified by the MF Model and the LOA Method on the Alpha TV Commercial Study Data [Green 1978] 108 A Fuzzy Relationship Between Assets and Creditworthiness 143 xiii

CHAPTER ONE INTRODUCTION 1.1. GENERAL DESCRIPTION OF NEURAL NElVORXS For our purposes, artificial neural networks are defined as massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations, which are intended to react to information o~ the objects of the real world in a manner analogous to biological nervous systems [Kohonen 1987). Neural networks or simply "neural nets" may also be referred to as connectionist models, parallel distributed processing models, and neuromorphic systems [Lippmann 1987J. Neural network architecture may be described in various ways, depending upon its desired function [Lippmann 1987, Hecht Nielsen 1987}. The most general topology of a neural network is shown in Figure 1.1. The neural network system carries ou~ the information processing operation as a mathematical mapping ~ of vector X to vector Y so that Y- (X), where X is the vector of external inputs to the network, and Y is the vector of outputs. Units ("neurons") within the network may receive input signals and/or lateral feedbacks 1