PERFORMANCE ANALYSIS OF PROBABILISTIC POTENTIAL FUNCTION NEURAL NETWORK CLASSIFIER

Similar documents
Artificial Neural Networks written examination

Learning Methods for Fuzzy Systems

Python Machine Learning

INPE São José dos Campos

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evolutive Neural Net Fuzzy Filtering: Basic Description

Softprop: Softmax Neural Network Backpropagation Learning

SARDNET: A Self-Organizing Feature Map for Sequences

The Good Judgment Project: A large scale test of different methods of combining expert predictions

(Sub)Gradient Descent

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Axiom 2013 Team Description Paper

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Speaker Identification by Comparison of Smart Methods. Abstract

Human Emotion Recognition From Speech

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

On the Formation of Phoneme Categories in DNN Acoustic Models

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Reinforcement Learning by Comparing Immediate Reward

An OO Framework for building Intelligence and Learning properties in Software Agents

Probabilistic Latent Semantic Analysis

Evolution of Symbolisation in Chimpanzees and Neural Nets

Lecture 1: Machine Learning Basics

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Issues in the Mining of Heart Failure Datasets

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A Pipelined Approach for Iterative Software Process Model

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

BMBF Project ROBUKOM: Robust Communication Networks

Speech Recognition at ICSI: Broadcast News and beyond

A study of speaker adaptation for DNN-based speech synthesis

Improving Fairness in Memory Scheduling

Discriminative Learning of Beam-Search Heuristics for Planning

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Test Effort Estimation Using Neural Network

Knowledge-Based - Systems

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Probability and Statistics Curriculum Pacing Guide

Artificial Neural Networks

A Reinforcement Learning Variant for Control Scheduling

Rule Learning With Negation: Issues Regarding Effectiveness

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A Case Study: News Classification Based on Term Frequency

Classification Using ANN: A Review

arxiv: v1 [cs.cv] 10 May 2017

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Introduction to Simulation

Early Model of Student's Graduation Prediction Based on Neural Network

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Modeling function word errors in DNN-HMM based LVCSR systems

Statewide Framework Document for:

Software Maintenance

An empirical study of learning speed in backpropagation

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Modeling function word errors in DNN-HMM based LVCSR systems

Lecture 10: Reinforcement Learning

arxiv: v1 [math.at] 10 Jan 2016

Device Independence and Extensibility in Gesture Recognition

GACE Computer Science Assessment Test at a Glance

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Syntactic systematicity in sentence processing with a recurrent self-organizing network

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Honors Mathematics. Introduction and Definition of Honors Mathematics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Speech Emotion Recognition Using Support Vector Machine

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Detailed course syllabus

The Singapore Copyright Act applies to the use of this document.

Using the Artificial Neural Networks for Identification Unknown Person

Word Segmentation of Off-line Handwritten Documents

An Online Handwriting Recognition System For Turkish

Automatic Pronunciation Checker

On the Combined Behavior of Autonomous Resource Management Agents

An Introduction to Simio for Beginners

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning Methods in Multilingual Speech Recognition

Deploying Agile Practices in Organizations: A Case Study

Data Fusion Through Statistical Matching

CSL465/603 - Machine Learning

Applications of data mining algorithms to analysis of medical data

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Henry Tirri* Petri Myllymgki

Transcription:

PERFORMANCE ANALYSIS OF PROBABILISTIC POTENTIAL FUNCTION NEURAL NETWORK CLASSIFIER GURSEL SERPEN 1 AND HONG JIANG Electrical Engineering & Computer Science Department, University of Toledo, Toledo, OH 43606 LLOYD G. ALLRED Software Engineering Division, Ogden Air Logistics Center, Hill AFB, Ogden, UT 84056 Abstract: Simulation analysis of recently proposed Probabilistic Potential Function Neural Network classifier algorithm on a set of benchmark problems was performed. Benchmark problems included IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. The performance of Probabilistic Potential Function Neural Network algorithm on these benchmark problems was compared to the performance of other important neural network classification algorithms including Multi-Layer Perceptron Network, Learning Vector Quantizer Network, Radial Basis Function Network, and Probabilistic Neural Network. Specially, classification performance of each algorithm was studied. Simulation results indicate that the Probabilistic Potential Function Neural Network offers a set of desirable features as well as top tier classification performance, which render it to be a viable choice among neuro-classifiers. INTRODUCTION There is a large set of neural network paradigms in the literature addressing the pattern recognition problems (Werbos, 1991). Significant neuro-classifier paradigms include the multi-layer perceptron (MLP) (Werbos, 1991 & 1994), the learning vector quantizer (LVQ) (Kohonen, 1991), the radial basis function network (RBF) (Poggio, 1994 & Parzen, 1962) and probabilistic neural network (PNN) (Specht, 1987) among others. Each of these neural pattern classifiers has a number of shortcomings which render them not applicable to pattern classification tasks which would require the neural network paradigm to possess the following properties: 1. offer fast (real-time) training and classification cycles even if implemented in software, 2. do not require an initial guess for the network topology, 1 Authors gratefully acknowledge grant support for Dr. Gursel Serpen by the Air Force Office of Scientific Research under Summer Faculty Research Extension Program.

rather topologically adapt to a particular instance of the classification problem at hand in a dynamic way as the training progresses, 3. discover clustering properties of training data and adapt to a minimal network topology in terms of needed computational resources, 4. implement an incremental learning procedure and hence, do not disturb the previous state of the network but simply add new computational resources to the existing network topology, and 5. require a small number of parameters to be specified heuristically while the network performance being insensitive to large variations in the value of those parameters, and 6. form classification boundaries which optimally separate the classes which are likely to be formed from a set of disconnected subclasses in the pattern space; the joint probability density function of a particular class is likely to have many modes. A new neural network algorithm which has the potential to perform better than any of the four paradigms discussed above for a stochastic pattern classification problem and, at the same time, do not suffer from shortcomings associated with each paradigm has been introduced in reference (Serpen et al., 1996). In brief, the proposed neural network algorithm, called the Probabilistic Potential Function Neural Network (PPFNN), theoretically possesses all six properties stated above. In this paper, we will demonstrate using simulation studies that the PPFNN offers classification performance comparable to those of RBF and PNN, fast training and classification cycles, and is insensitive to large variations in the value of the single heuristically determined parameter. The PPFNN employs four feedforward layers to implement a stochastic decision making rule. The first layer is the pattern layer which has the node count equal to the dimensionality of the patterns. Nodes in the hidden layer loosely represent the cluster centers in the data set and are connected to output layer nodes through ij ij modifiable weights, w ij = γ k, where γ k is an element in the sequence of positive reals (e.g., harmonic sequence given by {1/k}, k = 1,2,...,) for training pattern k, hidden node i and output node j. These weights are determined using the training algorithm presented in Figure 1. Output layer has as many nodes as there are classes. The fourth and final layer is basically a MAXNET (Pao, 1989). Nodes in the pattern layer simply distribute the incoming signal values to hidden layer nodes without any weighting. The mapping in the hidden layer nodes is a function of the form given by the following equation: 2 exp α x x k (1) where α (Alpha) is a spread parameter of the exponential function centered at x k. Outputs of hidden layer nodes go through trainable weights and feed inputs to the output layer nodes. Output layer nodes sum the incoming weighted signals and pass the weighted sum through a non-linear function defined as in the following equation: 0 if weighted _ sum < 0 output = weighted _ sum if weighted _ sum [ 0, 1] 1 if weighted _ sum > 1 (2)

The final layer is a MAXNET to choose the node with the highest input excitation value and to set its output to 1, while setting the outputs of the remaining nodes to zero. Each node in the MAXNET layer receives input from only one node in the output layer without any weighting. 1. Initialize the PPFNN 2 and assume a value for parameter Alpha. 2. Present a new feature vector (using k as index for feature vector) and compute network output. 3. If the network classifies the vector correctly for each class, no action needed. 4. Else A. Add a new hidden layer node (using i index for nodes), B. Center the potential function represented by the new hidden layer node around this vector, and C. Repeat for each class (using j as index for classes), If pattern belongs to the class and function, f k i 3, is positive, no action needed. Else if pattern does not belong to the class and function, f i k, is negative, no action needed. Else if pattern belongs to the class and function, f k i, is negative, connect output of hidden node i to the output node for class j through a weight of +γ k ij. Else if pattern does not belong to the class and function, f k i, is positive, connect output of hidden node i to the output node for class j through a weight of - γ k ij. 5. Repeat the procedure until all training patterns are processed. Figure 1. Pseudocode for PPFNN Algorithm PERFORMANCE ANALYSIS A comprehensive simulation analysis of newly proposed PPFNN algorithm on a set of benchmark classification problems have been performed (Jiang, 1997). Benchmark problems are IRIS, Sonar, Vowel Recognition, 2-Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. Performance of PPFNN has been compared to those of MLP, LVQ, RBF and PNN on these benchmark problems, see Appendix for neural network parameter settings. It is important to note that none of the neural network architectures were 2 Number of nodes in pattern layer is equal to number of features in pattern vectors. Initially, hidden layer has a single node centered at a randomly chosen training pattern and there are as many nodes in the output layer as there are classes. 3 The iterative formula f ( ) k x = fk 1( x) ± γ kk( x, xk ) computes the function f k () x, where K( x, x k ) is the potential function (Tou & Gonzalez, 1981), and the coefficients γ k can be obtained from the harmonic sequence { 1 / k }, k=1, 2,...

rigorously optimized: this was done to observe each neural classifier s first-order computational potential. Simulation results are presented in Tables 1 through 3. Because the PNN is applicable to a two-class classification task only, it was not employed for IRIS, Vowel and Thyroid Gland Disease data sets. Simulation results presented in Tables 1 through 3 indicate that the PPFNN requires minimal training time while offering leading classification performance for all benchmark problems tested. Classification performance of PPFNN surpasses all other algorithms for IRIS and Wisconsin Breast Cancer data sets. It is a close second to the top performing neuro-classifier algorithm for Sonar, Vowel, Cleveland Heart Disease and Thyroid Gland Disease data sets. In overall, classification performance of PPFNN is superior to that of MLP and LVQ and comparable to those of RBF and PNN. TABLE 1: CLASSIFICATION PERFORMANCES OF NEURAL NETWORK ALGORITHMS Test Data Classification Rate in % MLP LVQ RBF PNN PPFNN 2-Spiral 4 50.00 55.73 98.96 89.58 91.67 IRIS 78.00 82.67 80.00-96.00 Sonar 53.85 62.98 71.15 74.04 73.08 Vowel 36.57 11.11 56.67-52.32 Wisconsin 59.94 87.88 66.67 95.15 95.76 Cleveland 55.17 57.93 65.86 55.86 58.28 Thyroid 36.74 81.86 72.09-78.14 The training time requirement for the PPFNN is on the order of training time requirements of RBF and PNN in general as presented in Table 2. The PPFNN has the minimum training time for the 2-Spiral, IRIS and Wisconsin Breast Cancer data sets. Its training time requirements are second lowest after RBF for Sonar, Vowel, Cleveland Heart Disease and Thyroid Gland Disease data sets. In all cases, the difference in the training time requirements between the RBF and the PPFNN is relatively small. Sensitivity of PPFNN performance to large variations of the potential function spread parameter, Alpha, is shown in Table 3. Results indicate that the classification performance of PPFNN is not significantly sensitive to variations in Alpha values for all problems except the Thyroid Gland Disease data set. Classification performance varies maximum 5% for Sonar, Vowel, Wisconsin Breast Cancer and Cleveland Heart Disease data sets. The largest change in classification performance due to the variation in Alpha value occurs for Thyroid Gland Disease data set, which is 21.74% for training data and 23.26% for test data. 4 Training and testing data sets are the same for this problem.

TABLE 2: TRAINING TIME (IN SECONDS) REQUIREMENTS OF NEURAL NETWORK ALGORITHMS 2-Spiral IRIS Sonar Vowel Wisconsin Cleveland Thyroid MLP 3556 12572 47400 52800 10317 8834 9746 LVQ 1500 1320 7800 5237 1653 1835 450 RBF 120 120 360 3600 137 495 65 PNN 120-886 - 220 2532 - PPFNN 120 120 621 3777 98 1010 120 Benchmark Problem TABLE 3: SENSITIVITY OF CLASSIFICATION PERFORMANCE OF PPFNN AS ALPHA VARIES Testing Interval for Alpha Maximum Variation in Classification Performance for Training Data Maximum Variation in Classification Performance for Test Data Sonar [4.0,15] 1.24% 2.89% Vowel [3.5,15] 3.18% 3.03% Wisconsin [0.8,6.8] 0.00% 1.82% Cleveland [6.8,18.8] 6.39% 4.14% Thyroid Disease [0.0012,12] 21.74% 23.26% CONCLUSIONS Simulation results demonstrate that PPFNN performance is comparable to or better than that of MLP, RBF, LVQ and PNN when the set of seven benchmark problems are considered: these benchmark problems include the IRIS, Sonar, Vowel, 2- Spiral, Wisconsin Breast Cancer Disease, Cleveland Heart Disease and Thyroid Gland Disease data sets. Performance criteria employed in the simulation study included the network training time and classification rates for test data. Simulation results indicated that PPFNN consistently performed in the leading group of classifers over the set of problems tested, which was not the case for the rest of the neuro-classifier algorithms. The training time requirements of the PPFNN was generally minimal leading to the conclusion that the PPFNN algorithm is a good choice for real-time implementation. PPFNN performance was not affected by large variations in the value of the only adjustable parameter Alpha which determines the spread of the potential functions. In conclusion, simulation results indicate that PPFNN is a robust neuro-classifier algorithm suitable for real-time environments. REFERENCES Jiang, H., (1997). Performance Analysis of Probabilistic Potential Function Neural Network Classifier. Master s Thesis, The University of Toledo, Toledo, OH.

Kohonen, T., (1991). Improved Versions of Learning Vector Quantization, IJCNN 91 Proceedings, (1), 545-550. Serpen, G., Allred L. G. and Cios K. J., (1996). Probabilistic Potential Function Neural Network Classifier, ICNN 96 Proceedings, Vol. Special Sessions, 193-198. Specht, D. F., (1987). Probabilistic Neural Networks for Classification, Mapping, or Associative Memory, IJCNN 87 Proceedings, (1), 525-532. Parzen, E., (1962). On Estimation of a Probability Density Function and Mode, Annals of Mathematical Statistics, (33), 1065-1076. Poggio, F., (1994). Regularization Theory, Radial Basis Functions and Networks, From Statistics to Neural Nets: Theory and Pattern Recognition Applications, NATO ASI Series, (136), 83-104. Tou, J. T., Gonzalez, R. C., (1981). Pattern Recognition Principles, Addison-Wesley Publishing Company: Reading: MA. Pao, Y-H., (1989). Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, Reading: MA. Werbos, P. J., (1994). The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, John Wiley & Sons, Inc. New York, NY. Werbos, P. J., (1991). Links Between Artificial Neural Networks and Statistical Pattern Recognition, Artificial Neural Networks and Statistical Pattern Recognition: Old and New Connections, I. K. Sethi and A. K. Jain (Editors), 11-31. APPENDIX PARAMETER VALUES FOR NEURAL NETWORK CLASSIFIERS 2-Spiral IRIS Sonar Vowel Wisconsin Cleveland Thyroid MLP Learning 0.001 0.01 2.0 2.0 2.0 0.001 0.001 Rate Momentum 0.5 0.9 0.0 0.0 0.0 0.5 0.5 Constant LVQ Learning 0.01 0.01 0.01 0.01 0.01 0.01 0.01 Rate RBF Basis Func. 1.8 1.8 1.8 1.8 1.8 1.8 1.8 Spread PNN Basis Func. 1.8-5.3-6.8 3.8 - Spread PPFNN Alpha 1.8 0.46 4.4 9.0 6.8 3.8 0.0055