PERFORMANCE ANALYSIS OF PROBABILISTIC POTENTIAL FUNCTION NEURAL NETWORK CLASSIFIER GURSEL SERPEN 1 AND HONG JIANG Electrical Engineering & Computer Science Department, University of Toledo, Toledo, OH 43606 LLOYD G. ALLRED Software Engineering Division, Ogden Air Logistics Center, Hill AFB, Ogden, UT 84056 Abstract: Simulation analysis of recently proposed Probabilistic Potential Function Neural Network classifier algorithm on a set of benchmark problems was performed. Benchmark problems included IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. The performance of Probabilistic Potential Function Neural Network algorithm on these benchmark problems was compared to the performance of other important neural network classification algorithms including Multi-Layer Perceptron Network, Learning Vector Quantizer Network, Radial Basis Function Network, and Probabilistic Neural Network. Specially, classification performance of each algorithm was studied. Simulation results indicate that the Probabilistic Potential Function Neural Network offers a set of desirable features as well as top tier classification performance, which render it to be a viable choice among neuro-classifiers. INTRODUCTION There is a large set of neural network paradigms in the literature addressing the pattern recognition problems (Werbos, 1991). Significant neuro-classifier paradigms include the multi-layer perceptron (MLP) (Werbos, 1991 & 1994), the learning vector quantizer (LVQ) (Kohonen, 1991), the radial basis function network (RBF) (Poggio, 1994 & Parzen, 1962) and probabilistic neural network (PNN) (Specht, 1987) among others. Each of these neural pattern classifiers has a number of shortcomings which render them not applicable to pattern classification tasks which would require the neural network paradigm to possess the following properties: 1. offer fast (real-time) training and classification cycles even if implemented in software, 2. do not require an initial guess for the network topology, 1 Authors gratefully acknowledge grant support for Dr. Gursel Serpen by the Air Force Office of Scientific Research under Summer Faculty Research Extension Program.
rather topologically adapt to a particular instance of the classification problem at hand in a dynamic way as the training progresses, 3. discover clustering properties of training data and adapt to a minimal network topology in terms of needed computational resources, 4. implement an incremental learning procedure and hence, do not disturb the previous state of the network but simply add new computational resources to the existing network topology, and 5. require a small number of parameters to be specified heuristically while the network performance being insensitive to large variations in the value of those parameters, and 6. form classification boundaries which optimally separate the classes which are likely to be formed from a set of disconnected subclasses in the pattern space; the joint probability density function of a particular class is likely to have many modes. A new neural network algorithm which has the potential to perform better than any of the four paradigms discussed above for a stochastic pattern classification problem and, at the same time, do not suffer from shortcomings associated with each paradigm has been introduced in reference (Serpen et al., 1996). In brief, the proposed neural network algorithm, called the Probabilistic Potential Function Neural Network (PPFNN), theoretically possesses all six properties stated above. In this paper, we will demonstrate using simulation studies that the PPFNN offers classification performance comparable to those of RBF and PNN, fast training and classification cycles, and is insensitive to large variations in the value of the single heuristically determined parameter. The PPFNN employs four feedforward layers to implement a stochastic decision making rule. The first layer is the pattern layer which has the node count equal to the dimensionality of the patterns. Nodes in the hidden layer loosely represent the cluster centers in the data set and are connected to output layer nodes through ij ij modifiable weights, w ij = γ k, where γ k is an element in the sequence of positive reals (e.g., harmonic sequence given by {1/k}, k = 1,2,...,) for training pattern k, hidden node i and output node j. These weights are determined using the training algorithm presented in Figure 1. Output layer has as many nodes as there are classes. The fourth and final layer is basically a MAXNET (Pao, 1989). Nodes in the pattern layer simply distribute the incoming signal values to hidden layer nodes without any weighting. The mapping in the hidden layer nodes is a function of the form given by the following equation: 2 exp α x x k (1) where α (Alpha) is a spread parameter of the exponential function centered at x k. Outputs of hidden layer nodes go through trainable weights and feed inputs to the output layer nodes. Output layer nodes sum the incoming weighted signals and pass the weighted sum through a non-linear function defined as in the following equation: 0 if weighted _ sum < 0 output = weighted _ sum if weighted _ sum [ 0, 1] 1 if weighted _ sum > 1 (2)
The final layer is a MAXNET to choose the node with the highest input excitation value and to set its output to 1, while setting the outputs of the remaining nodes to zero. Each node in the MAXNET layer receives input from only one node in the output layer without any weighting. 1. Initialize the PPFNN 2 and assume a value for parameter Alpha. 2. Present a new feature vector (using k as index for feature vector) and compute network output. 3. If the network classifies the vector correctly for each class, no action needed. 4. Else A. Add a new hidden layer node (using i index for nodes), B. Center the potential function represented by the new hidden layer node around this vector, and C. Repeat for each class (using j as index for classes), If pattern belongs to the class and function, f k i 3, is positive, no action needed. Else if pattern does not belong to the class and function, f i k, is negative, no action needed. Else if pattern belongs to the class and function, f k i, is negative, connect output of hidden node i to the output node for class j through a weight of +γ k ij. Else if pattern does not belong to the class and function, f k i, is positive, connect output of hidden node i to the output node for class j through a weight of - γ k ij. 5. Repeat the procedure until all training patterns are processed. Figure 1. Pseudocode for PPFNN Algorithm PERFORMANCE ANALYSIS A comprehensive simulation analysis of newly proposed PPFNN algorithm on a set of benchmark classification problems have been performed (Jiang, 1997). Benchmark problems are IRIS, Sonar, Vowel Recognition, 2-Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. Performance of PPFNN has been compared to those of MLP, LVQ, RBF and PNN on these benchmark problems, see Appendix for neural network parameter settings. It is important to note that none of the neural network architectures were 2 Number of nodes in pattern layer is equal to number of features in pattern vectors. Initially, hidden layer has a single node centered at a randomly chosen training pattern and there are as many nodes in the output layer as there are classes. 3 The iterative formula f ( ) k x = fk 1( x) ± γ kk( x, xk ) computes the function f k () x, where K( x, x k ) is the potential function (Tou & Gonzalez, 1981), and the coefficients γ k can be obtained from the harmonic sequence { 1 / k }, k=1, 2,...
rigorously optimized: this was done to observe each neural classifier s first-order computational potential. Simulation results are presented in Tables 1 through 3. Because the PNN is applicable to a two-class classification task only, it was not employed for IRIS, Vowel and Thyroid Gland Disease data sets. Simulation results presented in Tables 1 through 3 indicate that the PPFNN requires minimal training time while offering leading classification performance for all benchmark problems tested. Classification performance of PPFNN surpasses all other algorithms for IRIS and Wisconsin Breast Cancer data sets. It is a close second to the top performing neuro-classifier algorithm for Sonar, Vowel, Cleveland Heart Disease and Thyroid Gland Disease data sets. In overall, classification performance of PPFNN is superior to that of MLP and LVQ and comparable to those of RBF and PNN. TABLE 1: CLASSIFICATION PERFORMANCES OF NEURAL NETWORK ALGORITHMS Test Data Classification Rate in % MLP LVQ RBF PNN PPFNN 2-Spiral 4 50.00 55.73 98.96 89.58 91.67 IRIS 78.00 82.67 80.00-96.00 Sonar 53.85 62.98 71.15 74.04 73.08 Vowel 36.57 11.11 56.67-52.32 Wisconsin 59.94 87.88 66.67 95.15 95.76 Cleveland 55.17 57.93 65.86 55.86 58.28 Thyroid 36.74 81.86 72.09-78.14 The training time requirement for the PPFNN is on the order of training time requirements of RBF and PNN in general as presented in Table 2. The PPFNN has the minimum training time for the 2-Spiral, IRIS and Wisconsin Breast Cancer data sets. Its training time requirements are second lowest after RBF for Sonar, Vowel, Cleveland Heart Disease and Thyroid Gland Disease data sets. In all cases, the difference in the training time requirements between the RBF and the PPFNN is relatively small. Sensitivity of PPFNN performance to large variations of the potential function spread parameter, Alpha, is shown in Table 3. Results indicate that the classification performance of PPFNN is not significantly sensitive to variations in Alpha values for all problems except the Thyroid Gland Disease data set. Classification performance varies maximum 5% for Sonar, Vowel, Wisconsin Breast Cancer and Cleveland Heart Disease data sets. The largest change in classification performance due to the variation in Alpha value occurs for Thyroid Gland Disease data set, which is 21.74% for training data and 23.26% for test data. 4 Training and testing data sets are the same for this problem.
TABLE 2: TRAINING TIME (IN SECONDS) REQUIREMENTS OF NEURAL NETWORK ALGORITHMS 2-Spiral IRIS Sonar Vowel Wisconsin Cleveland Thyroid MLP 3556 12572 47400 52800 10317 8834 9746 LVQ 1500 1320 7800 5237 1653 1835 450 RBF 120 120 360 3600 137 495 65 PNN 120-886 - 220 2532 - PPFNN 120 120 621 3777 98 1010 120 Benchmark Problem TABLE 3: SENSITIVITY OF CLASSIFICATION PERFORMANCE OF PPFNN AS ALPHA VARIES Testing Interval for Alpha Maximum Variation in Classification Performance for Training Data Maximum Variation in Classification Performance for Test Data Sonar [4.0,15] 1.24% 2.89% Vowel [3.5,15] 3.18% 3.03% Wisconsin [0.8,6.8] 0.00% 1.82% Cleveland [6.8,18.8] 6.39% 4.14% Thyroid Disease [0.0012,12] 21.74% 23.26% CONCLUSIONS Simulation results demonstrate that PPFNN performance is comparable to or better than that of MLP, RBF, LVQ and PNN when the set of seven benchmark problems are considered: these benchmark problems include the IRIS, Sonar, Vowel, 2- Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. Performance criteria employed in the simulation study included the network training time and classification rates for test data. Simulation results indicated that PPFNN consistently performed in the leading group of classifers over the set of problems tested, which was not the case for the rest of the neuro-classifier algorithms. The training time requirements of the PPFNN was generally minimal leading to the conclusion that the PPFNN algorithm is a good choice for real-time implementation. PPFNN performance was not affected by large variations in the value of the only adjustable parameter Alpha which determines the spread of the potential functions. In conclusion, simulation results indicate that PPFNN is a robust neuro-classifier algorithm suitable for real-time environments. REFERENCES Jiang, H., (1997). Performance Analysis of Probabilistic Potential Function Neural Network Classifier. Master s Thesis, The University of Toledo, Toledo, OH.
Kohonen, T., (1991). Improved Versions of Learning Vector Quantization, IJCNN 91 Proceedings, (1), 545-550. Serpen, G., Allred L. G. and Cios K. J., (1996). Probabilistic Potential Function Neural Network Classifier, ICNN 96 Proceedings, Vol. Special Sessions, 193-198. Specht, D. F., (1987). Probabilistic Neural Networks for Classification, Mapping, or Associative Memory, IJCNN 87 Proceedings, (1), 525-532. Parzen, E., (1962). On Estimation of a Probability Density Function and Mode, Annals of Mathematical Statistics, (33), 1065-1076. Poggio, F., (1994). Regularization Theory, Radial Basis Functions and Networks, From Statistics to Neural Nets: Theory and Pattern Recognition Applications, NATO ASI Series, (136), 83-104. Tou, J. T., Gonzalez, R. C., (1981). Pattern Recognition Principles, Addison-Wesley Publishing Company: Reading: MA. Pao, Y-H., (1989). Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, Reading: MA. Werbos, P. J., (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, John Wiley & Sons, Inc. New York, NY. Werbos, P. J., (1991). Links Between Artificial Neural Networks and Statistical Pattern Recognition, Artificial Neural Networks and Statistical Pattern Recognition: Old and New Connections, I. K. Sethi and A. K. Jain (Editors), 11-31. APPENDIX PARAMETER VALUES FOR NEURAL NETWORK CLASSIFIERS 2-Spiral IRIS Sonar Vowel Wisconsin Cleveland Thyroid MLP Learning 0.001 0.01 2.0 2.0 2.0 0.001 0.001 Rate Momentum 0.5 0.9 0.0 0.0 0.0 0.5 0.5 Constant LVQ Learning 0.01 0.01 0.01 0.01 0.01 0.01 0.01 Rate RBF Basis Func. 1.8 1.8 1.8 1.8 1.8 1.8 1.8 Spread PNN Basis Func. 1.8-5.3-6.8 3.8 - Spread PPFNN Alpha 1.8 0.46 4.4 9.0 6.8 3.8 0.0055