Identifying irrelevant input variables in chaotic time series problems: Using the genetic algorithm for training neural networks

Similar documents
Neural Network Model of the Backpropagation Algorithm

Fast Multi-task Learning for Query Spelling Correction

An Effiecient Approach for Resource Auto-Scaling in Cloud Environments

MyLab & Mastering Business

More Accurate Question Answering on Freebase

Channel Mapping using Bidirectional Long Short-Term Memory for Dereverberation in Hands-Free Voice Controlled Devices

1 Language universals

Information Propagation for informing Special Population Subgroups about New Ground Transportation Services at Airports

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Learning Methods for Fuzzy Systems

Softprop: Softmax Neural Network Backpropagation Learning

Knowledge Transfer in Deep Convolutional Neural Nets

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

A Neural Network GUI Tested on Text-To-Phoneme Mapping

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Does the Difficulty of an Interruption Affect our Ability to Resume?

Diagnostic Test. Middle School Mathematics

Evolution of Symbolisation in Chimpanzees and Neural Nets

ACTIVITY: Comparing Combination Locks

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

(Sub)Gradient Descent

A Process-Model Account of Task Interruption and Resumption: When Does Encoding of the Problem State Occur?

Henry Tirri* Petri Myllymgki

SARDNET: A Self-Organizing Feature Map for Sequences

School Size and the Quality of Teaching and Learning

Visual CP Representation of Knowledge

Evolutive Neural Net Fuzzy Filtering: Basic Description

Problem-Solving with Toothpicks, Dots, and Coins Agenda (Target duration: 50 min.)

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Classification Using ANN: A Review

Age Effects on Syntactic Control in. Second Language Learning

Generative models and adversarial training

Modeling function word errors in DNN-HMM based LVCSR systems

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Reinforcement Learning by Comparing Immediate Reward

Python Machine Learning

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

FY year and 3-year Cohort Default Rates by State and Level and Control of Institution

The Evolution of Random Phenomena

Model Ensemble for Click Prediction in Bing Search Ads

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

INPE São José dos Campos

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Artificial Neural Networks

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Lecture 10: Reinforcement Learning

When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!

medicaid and the How will the Medicaid Expansion for Adults Impact Eligibility and Coverage? Key Findings in Brief

Human Emotion Recognition From Speech

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Test Effort Estimation Using Neural Network

Mathematics 112 Phone: (580) Southeastern Oklahoma State University Web: Durant, OK USA

Dublin City Schools Mathematics Graded Course of Study GRADE 4

A cautionary note is research still caught up in an implementer approach to the teacher?

On-Line Data Analytics

On the Combined Behavior of Autonomous Resource Management Agents

An empirical study of learning speed in backpropagation

Psychometric Research Brief Office of Shared Accountability

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

CS Machine Learning

Keith Weigelt. University of Pennsylvania The Wharton School Management Department 2022 Steinberg-Dietrich Hall Philadelphia, PA (215)

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A study of speaker adaptation for DNN-based speech synthesis

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Deploying Agile Practices in Organizations: A Case Study

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Strategies for Solving Fraction Tasks and Their Link to Algebraic Thinking

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Data Fusion Through Statistical Matching

Time series prediction

Learning Methods in Multilingual Speech Recognition

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

MABEL ABRAHAM. 710 Uris Hall Broadway mabelabraham.com New York, New York Updated January 2017 EMPLOYMENT

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

Arizona s College and Career Ready Standards Mathematics

Women in Orthopaedic Fellowships: What Is Their Match Rate, and What Specialties Do They Choose?

Trends in College Pricing

An Empirical and Computational Test of Linguistic Relativity

Physics 270: Experimental Physics

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

THE INFORMATION SYSTEMS ANALYST EXAM AS A PROGRAM ASSESSMENT TOOL: PRE-POST TESTS AND COMPARISON TO THE MAJOR FIELD TEST

Improving Conceptual Understanding of Physics with Technology

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Transcription:

Idenifying irrelevan inpu variables in chaoic ime series problems: Using he geneic algorihm for raining neural neworks Randall S. Sexon 1 Ball Sae Universiy Deparmen of Managemen Muncie, Indiana 47306 Email: rssexon@mail.bsu.edu Work: 765-285-5320 Fax: 765-285-8024 1 Randall Sexon is an Assisan Professor a Ball Sae Universiy. Dr. Sexon is suppored in par by he George A. Ball Disinguished Research Fellowship program.

1 Idenifying irrelevan inpu variables in chaoic ime series problems: Using he geneic algorihm for raining neural neworks ABSTRACT Many researchers consider a neural nework o be a "black box" ha maps he unknown relaionships of inpus o corresponding oupus. By viewing neural neworks in his manner, researchers ofen include many more inpu variables han are necessary for finding good soluions. This causes unneeded compuaion as well as impeding he search process by increasing he complexiy of he nework. The main reason for his raional is he dependence upon gradien echniques, ypically a variaion of backpropagaion, by he vas majoriy of neural nework researchers for nework opimizaion. Since gradien echniques are incapable of idenifying unneeded weighs in a soluion, researchers have no been able o deermine conribuing inpus from hose ha are irrelevan. By using a global search echnique, he geneic algorihm, for neural nework opimizaion, i is possible o idenify unneeded weighs in he nework model, which allows for idenificaion of irrelevan inpu variables. This paper demonsraes hrough an inensive Mone Carlo sudy, ha he geneic algorihm can auomaically reduce he dimensionaliy of neural nework models during nework opimizaion. The geneic algorihm is also direcly compared wih backpropagaion neworks o show effeciveness for finding global versus local soluions. KEY WORDS: Neural Neworks, Geneic Algorihm, Backpropagaion, Learning, Generalizaion, Opimizaion

2 1. INTRODUCTION A major goal of NN research is o find he soluion weighs ha no only work well for he raining daa (in-sample daa), bu also generalize well for inerpolaion daa (ou-of-sample). For any neural nework (NN) model, as he number of inpus and hidden nodes increase, he complexiy of solving hese models does so as well. By correcly idenifying he relevan variables and hidden nodes in a NN model, researchers can reduce he complexiy and dimensionaliy of he problem, enhancing he probabiliy for finding soluions ha generalize o ou-of-sample observaions. Generalizaion refers o he abiliy of he NN o forecas esimaes from paerns ha have no been seen by he nework. By decreasing he number of weighs in he NN, he algorihm is forced o develop general rules o discriminae beween he inpu paerns. The problem of producing NNs ha beer generalize has been widely sudied (Burki [1991], Corell, Girard, Girard, & Mangeas [1993], Drucker & LeCun [1992], Fahlman & Lebiere [1990], Holmsröm & Koisinen [1992], Kamimura [1993], Karmin [1990], Kruschke [1989], Lendaris & Harls [1990], Romaniuk [1993]). I has been shown ha generalizaion is beer in smaller neworks (Baum & Haussler [1989], Schiffman, Joos, & Werner [1993]) Much of his pas research aemps o produce a parsimonious NN soluion by pruning echniques. Pruning echniques progressively reduce he size of a large nework by eliminaing weighs in he NN. The weighs ha are eliminaed fall ino wo caegories for gradien raining algorihms, acive and near inacive. Near inacive weighs are hose ha have decayed close o zero. Removing hese weighs, since hey are arbirarily close o zero, will likely have lile effec on he abiliy of he NN o generalize. If weighs are pruned ha are no close o zero, or acive weighs, he NN will need considerable reraining (Lee [1997]). In his paper, he geneic algorihm (GA) is used simulaneously o reduce he errors beween esimaes and real oupu

3 values and he number of connecions in he NN. By doing so, he GA can find an opimal NN archiecure as well as idenifying irrelevan inpu variables in he NN model. The majoriy of neural nework researchers use a gradien echnique, ypically a variaion of backpropagaion (Rumelhar & McClelland [1986]), for neural nework opimizaion. A problem ha ofen occurs when using backpropagaion (BP) is he loss of generalizaion power. There are wo main reasons why his occurs, over parameerizaion and local convergence. Over parameerizaion is simply including more weighs in he soluion han are necessary o esimae he funcion. By including addiional weighs in he soluion, he degrees of freedom are reduced. However, here is currenly no known mehod for calculaing he exen of he reducion. When over parameerizaion occurs, he nework ends o memorize he raining daa, which decreases he generalizaion abiliy of he nework. Unneeded weighs are added o he model by including irrelevan inpu variables or unneeded hidden nodes. Currenly, no effecive mehods for idenifying hese unneeded weighs are known using gradien echniques. Local convergence also conribues o a decrease in generalizaion abiliy. BP, by is very naure, converges o local soluions. Alhough, BP could possibly converge upon a local soluion ha is global, i is unlikely because of he complex error surfaces generaed when opimizing NNs. To improve he generalizaion abiliy of NNs, an appropriae global search mehod is needed ha will idenify unneeded weighs in he soluion, resuling in a parsimonious NN soluion. The geneic algorihm (GA) is proposed as an appropriae global search echnique ha is no limied o derivaive based objecive funcions for NN opimizaion. For a limied number of problems, he GA was shown significanly o ouperform BP for NN raining (Sexon, Dorsey, & Johnson [1998]). The purpose of his paper is o show he effeciveness of he GA o search for an opimal parsimonious NN soluion for a se of chaoic ime series problems. Alhough, he

4 GA has been used in pas research for idenificaion of relevan inpu variables and NN archiecure for BP rained neworks, his research is unique in ha i also uses he GA for raining he NN. The benefis ha resul from his search algorihm include, a reducion in nework complexiy, idenificaion of irrelevan inpu variables, and increased generalizaion power of he NN soluion. The following secion includes a general descripion of he geneic algorihm. Secion 3 describes he Mone Carlo sudy. Secion 4 provides he resuls of he comparison followed by final remarks and conclusions in Secion 5. 2. THE GENETIC ALGORITHM Research combining geneic algorihms and neural neworks began o appear in he mid o lae 1980s. More han 250 references can be readily found in lieraure oday. A survey of ha research can be found in Schaffer e al. [1992]. The wo primary direcions of his pas research include using he GA o improve he performance of BP by finding opimal neural nework archiecures and/or parameer seings, or as an alernaive o BP for opimizing he nework. This paper combines hese pas research direcions focusing on he use of he GA as an alernaive o BP, which can also find opimal NN archiecures. Mos of he pas research using he geneic algorihm for nework opimizaion has found ha he GA is no compeiive wih he bes gradien learning mehods. I has recenly been shown however, (see Sexon, Dorsey and Johnson [1998]) ha he problem wih his research is in he implemenaion of he GA and no is inabiliy o perform he ask. For example, he majoriy of his research encodes each candidae soluion of weighs ino binary srings. This approach works well for opimizaion of problems wih only a few variables. For neural neworks wih large numbers of weighs his binary encoding resuls in exremely long srings.

5 As a resul, he paerns ha are essenial o he GA's effeciveness are virually impossible o mainain wih he sandard GA operaors such as crossover and muaion. A more effecive approach is o allow he GA o operae over real valued parameers. Examples of his approach can be found in (Monana & Davis [1989], Sexon e al. [1998]). Alhough, Monana & Davis successfully ouperformed BP using he GA, heir specific implemenaion of he geneic algorihm resuled in he crossover operaor causing excessive loss of informaion abou he Schema of he parameers. These Schemas were influenial in he prior generaion selecion of he curren generaion's srings and herefore he loss of his informaion reduces he effeciveness of he search process. The alernaive approach described in he Sexon e al. [1998] paper also successfully ouperformed BP on a variey of problems. This line of research in based on he algorihm developed by Dorsey & Mayer [1995]. As opposed o BP, he GA is a global search procedure ha searches from one populaion of soluions o anoher, focusing on he area of he bes soluion so far, while coninuously sampling he oal parameer space. The GA has recenly been shown o perform excepionally well a obaining he global soluion when opimizing difficul nonlinear funcions (Dorsey & Mayer [1994], Dorsey & Mayer [1995]). An exension of his research has also shown he GA o perform well for opimizing he NN, anoher complex nonlinear funcion (Sexon, Dorsey, & Johnson [1998], Dorsey, Johnson, & Mayer [1994]). Unlike BP, which moves from one poin o anoher based on gradien informaion, he GA simulaneously searches in many direcions, which enhances he probabiliy of finding he global opimum. Figure 1 illusraes a simple ouline of he GA used in his sudy while he erms used and parameer seings are briefly described in he following paragraphs. A formal descripion of he algorihm can be found in Dorsey & Mayer [1995].

6 Figure 1. Ouline of he Geneic Algorihm Iniializaion: Choose an iniial populaion conaining 20 soluions o be he curren populaion. Each soluion consiss of a sring of weighs ha are plugged ino he NN. Compue he objecive funcion value for each soluion in he populaion. Evaluaion: Each member of he curren populaion is evaluaed by a finess funcion based on heir objecive funcion value o assign each soluion a probabiliy for being redrawn in he nex generaion. Reproducion: A maing pool of 20 soluions is creaed by selecing soluions from he curren populaion based on heir assigned probabiliy. Crossover: The soluions in he maing pool are hen randomly paired consrucing 10 ses of paren soluions. A poin is randomly seleced for each pair in which he paren soluions will swich he weighs ha are preceding ha poin, generaing 20 new soluions or he nex generaion. Muaion: For each generaion, a small probabiliy of any weigh from he curren populaion can be replaced by a randomly drawn value in he enire weigh space. Muaion2: For each generaion, a small probabiliy of any weigh from he curren populaion can be replaced by a hard zero. Terminaion. The algorihm will erminae on a user-specified number of generaions. Similar o BP, he GA randomly draws values in order o begin he search. However, unlike BP, which only draws weighs for one soluion, he GA will draw weighs for a populaion of soluions. The populaion size for his sudy is se o 20, which is user defined and is based on pas research (Dorsey & Mayer [1995]). Once he populaion of soluions is drawn, he global search begins wih his firs generaion. Each of he soluions in he populaion is hen evaluaed based on a preseleced objecive funcion, which is no necessarily differeniable. Since our objecive is o find a global soluion

7 ha eliminaes unneeded connecions in he model, i is necessary o include an objecive funcion ha is no differeniable. The objecive funcion chosen for his sudy is shown in Equaion 1. The goal of his objecive funcion is o find a NN soluion ha reduces he sum of squared errors as well as he number of non-zero weighs. (1) E = N i= 1 ( O i ) i 2 + C N i= 1 ( O i N i ) 2 Where N is he number of exemplars in he daa se, O is he oupu value, is he NN esimae, and C is he number of non-zero weighs in he soluion. Alhough, his objecive funcion seems o work well for he problems in his sudy, he penaly value assigned for non-zero connecions is arbirary. Addiional research, beyond he scope of his sudy, is warraned for finding an opimal penaly assignmen and is lef for fuure research. Once he soluions in he populaion are evaluaed, a probabiliy is assigned o each soluion based on he value of he objecive funcion. For example, using he chosen objecive funcion, he soluions ha resul in he smalles error erm are assigned he highes probabiliies. This complees he firs generaion. The second generaion is hen randomly drawn based on he assigned probabiliies from he former. For example, he bes soluions (ones wih he smalles errors and herefore highes assigned probabiliies) in he firs generaion are more likely o be drawn for he second generaion of 20 soluions. This is known as reproducion, which parallels he process of naural selecion or "survival of he fies. The soluions ha are mos favorable in opimizing he objecive funcion will reproduce and hrive in fuure generaions, while poorer soluions die ou.

8 Before he soluions in he second generaion can be evaluaed, wo processes mus ake place. They are crossover and muaion. This new populaion, which only includes soluions ha exised in he prior generaion, will be randomly grouped ino pairs of soluions. For each pair of soluions, a random subse of weighs is chosen and swiched wih is paired soluion (crossover). For example, if a soluion conains 10 weighs, a random ineger value is drawn from one o 10 for he firs pair of soluions, for his example les say five is seleced. Every weigh above he 5h weigh is now swiched beween he paired soluions, resuling in wo new soluions for each pair. Once his is done for each pair of soluions, crossover is complee. In order o sample he enire parameer space, and no be limied only o hose iniially random drawn values from he firs generaion, muaion mus occur. Each soluion in his new generaion now has a small probabiliy ha any of is weighs may be replaced wih a value uniformly seleced from he parameer space (muaion). If muaion occurs, he likelihood of his new weigh surviving in he nex generaion is based on he probabiliies assigned when he new soluion is applied o he objecive funcion. For example, if he soluion now has a lower error value because of his new muaed weigh, he soluion conaining his weigh will now have a higher probabiliy of being drawn in he nex generaion, as well as a lower probabiliy of being drawn if i causes he error o increase. To allow he GA o idenify irrelevan weighs in he soluions, an addiional process (muaion2) was included. Each soluion in his new generaion now has a small probabiliy ha any of is weighs may be replaced wih a hard zero. Once replicaion, crossover, muaion, and muaion2 have occurred he new generaion can now be evaluaed in order o deermine he new probabiliies for he nex generaion. This process coninues unil he iniial populaion evolves o a generaion ha will bes solve he opimizaion problem, ideally he global soluion.

9 3. MONTE CARLO DESCRIPTION 3.1. Relevance of he problems Deerminisic chaoic dynamical sysems, such as he funcions used in his sudy, are of grea ineres o researchers because of heir comparabiliy o he chaoic behavior of economic and financial daa. Chaos as discussed in his research is a special ype of sysem, which is capable of exhibiing complex, ofen aperiodic, behavior in ime. Recenly here has been grea ineres in deermining wheher cerain financial and economic ime series are beer described by linear sochasic models or are appropriaely characerized by deerminisic chaos. Empirical research has been hampered in deecing he presence of chaos in such ime series due o he apparen randomness of his ype of daa. While he irregulariy of such variables as GNP, employmen, ineres rae, and exchange raes have generally been aribued o random flucuaions, he abiliy of even simple deerminisic chaoic models o produce complex ime pahs ha appear o be random has araced aenion as a possible alernaive explanaion. Taking his ino consideraion, he significance of accuraely esimaing such chaoic behavior is apparen and is he reason his ype of daa is used for his sudy. 3.2. Chaoic ime series problems The following chaoic ime series problems were included for his sudy. Problems 1-5 were aken from chaos lieraure. Problems 1-4 were aken from Schuser [1995] and he 5h problem, a modified version of he Mackey-Glass equaion from Gallan and Whie [1992]. To es he GA s abiliy o deermine irrelevan variables in larger problems, he Mackey-Glass equaion (problem 5) was modified o incorporae addiional independen inpu variables.

10 Alhough, he funcions are known, he NNs used o esimae hese funcions do no use his informaion bu solely rely on he generaed daa for raining purposes. Chaoic ime series problems 1) X = 4X (1 X ) 2) X 3) X 4) X 5) X 6) X 7) X = 4X 2 = X 2 = X = X = X = X 1 1 1 1 3 1 1 1 3X 1.8 1.6 1 1 0.2X + 10.5 1 + X 0.2X + 10.5 1 + X 5 10 5 15 10 15 0.1X 0.1X 1 1 12 0.2X 12 + X i 0.1X 10 1 i= 1 1 + X 12 3.3. Daa generaion Fory daa ses of 100 observaions each were generaed for each of he seven problems. Each daa se was iniialized wih a random number drawn from a uniform disribuion in order begin he ime series. To show he abiliy of he GA o idenify irrelevan variables in he NN model, an addiional irrelevan inpu variable was added o each of he daa ses. The 40 daa ses were basically spli ino wo experimens. For half of he daa ses (20 daa ses) for each problem, an addiional irrelevan variable was included ha consised of a randomly drawn value from a uniform disribuion. The second half of he daa ses (20 daa ses) for each problem, included irrelevan inpu variables ha consised of an addiional ime lag. In boh cases he irrelevan inpu variables had no effec on he acual oupu variable. For each problem and irrelevan inpu variable ype (random or lag), 10 daa ses were used for raining and 10 for esing. Problems five and six are ineresing because hey already include irrelevan lag variables. For problems five and six he GA will have o idenify four and 14 irrelevan

11 variables, respecively. Problem seven incorporaes all 12 lag variables ino he oupu. The only irrelevan variable included is he addiional inpu variable added o he daa ses. 3.4. Measures and comparisons The main heme of his paper is o demonsrae he GA s abiliy o idenify irrelevan variables in he NN model. This is done by firs idenifying unneeded weighs in he NN model. Irrelevan variables are hen deermined by simply inspecing he soluion. Those inpus ha have zeros for all connecion weighs are hen deermined as irrelevan o he model. The performance measuremen for he GA will include a percenage of he correcly idenified irrelevan variables for all replicaions and problems. Alhough, he irrelevan variable idenificaion is he major heme of his sudy, i is meaningless if he soluions found are inferior o more sandard used mehods of NN opimizaion. For his reason, a comparison of wo variaions of he BP algorihm is compared wih he GA rained neworks, based on Roo Mean Squared Error (RMSE) and CPU ime in seconds. 3.5. Training wih he geneic algorihm The GA and corresponding parameers used for his sudy were se o values recommended by Dorsey and Mayer [1995]. The only parameer seleced was he number of generaions for raining. For his sudy, 3,000 generaions (60,000 epochs) was deermined o be sufficien for finding superior soluions over hose of BP. Alhough, he nework could have rained furher wih more generaions, i was no necessary o demonsrae he effeciveness of a global search algorihm. In each case he GA had no converged bu was sopped afer he specified number of generaions. If he GA were allowed o coninue, i will converge arbirarily

12 close o he global soluion bu he addiional search ime was unnecessary for he comparison. The number hidden nodes included in he GA opimized NNs was deermined by an auomaic pre-run of each daa se. For each run, he NN archiecure sared wih only one hidden node and rained for 100 generaions. Once he raining erminaed he bes error was saved and one addiional hidden node was added o he nework. Training commenced for 100 addiional generaions. This process coninues unil a nework is found ha generaes an error ha is larger han he previous. Once his occurs, he NN archiecure is se o he number of hidden nodes ha generaed he bes error. Alhough, he number of generaions (100) used o deermine he NN archiecure is se arbirarily, his value seems o work sufficienly well for hese problems. Furher research is needed o find an opimal epoch value for hese runs. 3.6. Training wih backpropagaion Two variaions of BP were used as a baseline comparison wih he GA resuls in order o show he GA s abiliy o idenify unneeded weighs and heir corresponding irrelevan variables, and o show is abiliy o find superior soluions. The firs BP variaion includes he Cascade Correlaion (Cascor) algorihm developed by Fahlman and Lebiere [1990]. This algorihm builds a NN opology while simulaneously idenifying he appropriae connecion weighs. Alhough, his algorihm aemps o deermine he correc number of hidden nodes o include in he NN model, i sill lacks he abiliy o idenify specific weighs ha are no needed. The second variaion of BP ha was used included he sandard BP algorihm wih hree differen archiecures. This se of runs included hree, four, and five hidden nodes for raining on all daa ses. For boh variaions he learning rae (sep value), momenum, and learning rae coefficien raio were se o 1.0, 0.9, and 0.5, respecively. The learning rae coefficien raio basically

13 reduced he learning rae and momenum erm by 0.5 every 10.000 epochs. This was done o help eliminae oscillaions and effecively o converge upon a soluion. An epoch is defined as one complee pass hrough he raining se. Each of he four BP configuraions rained for 250,000 epochs for each daa se. 4. RESULTS The GA rained on 20 differen replicaions for each problem, which included 10 replicaions ha had a random irrelevan variable and 10 replicaions ha had an addiional lag irrelevan variable, oaling 140 differen NNs. Ou of hese 140 replicaions he GA correcly idenified all or 100% of he irrelevan variables, including he hree addiional irrelevan variables in problem five and 13 irrelevan variables for problem six. Figure 2 illusraes he connecions of a GA rained NN for one of problem 5 s replicaions (irrelevan lag daa se). As can be seen in his figure, he GA correcly idenified he wo lags ha conribued o he oupu variable, while eliminaing he connecions o he irrelevan variables. Figure 2 NN archiecure for replicaion 1, problem 5 Oupu Hidden bias -6-5 -4-3 -2-1

14 To make his finding significan a comparison was made o demonsrae he abiliy of he GA o show is superioriy for finding global soluions. Table 1 and 2 shows he average RMSE of he es ses for he GA and four configuraions of BP, for replicaions conaining random irrelevan inpu variables and lag irrelevan inpu variables. Table 1 - Average RMSE for replicaions conaining random irrelevan inpu variables Problem GA BPC BP3 BP4 BP5 1 2.06E-07 2.07E-01 5.62E-02 5.50E-02 2.51E-02 2 2.94E-06 7.22E-01 7.33E-01 7.34E-01 7.32E-01 3 4.51E-06 2.93E-01 7.46E-02 6.82E-02 6.52E-02 4 1.24E-06 1.10E-01 3.26E-02 4.72E-02 1.87E-02 5 9.64E-02 2.86E-01 2.88E-01 3.02E-01 2.62E-01 6 1.37E-01 6.17E-01 6.13E-01 6.39E-01 6.16E-01 7 1.34E-01 3.73E-01 3.59E-01 3.46E-01 3.41E-01 BPC = Cascor algorihm BP3 = 3 hidden BP4 = 4 hidden BP5 = 5 hidden Table 2 - Average RMSE for replicaions conaining lag irrelevan inpu variables Problem GA BPC BP3 BP4 BP5 1 4.13E-06 2.22E-01 8.52E-02 8.40E-02 6.83E-02 2 3.14E-06 7.22E-01 7.15E-01 7.19E-01 7.19E-01 3 4.06E-06 1.88E-01 4.15E-01 1.24E-01 1.07E-01 4 7.24E-07 2.09E-01 4.47E-02 1.97E-02 1.97E-02 5 8.44E-02 2.86E-01 2.64E-01 2.14E-01 1.88E-01 6 1.15E-01 6.10E-01 5.86E-01 5.95E-01 6.33E-01 7 1.25E-01 3.57E-01 3.46E-01 3.41E-01 3.44E-01 BPC = Cascor algorihm BP3 = 3 hidden BP4 = 4 hidden BP5 = 5 hidden

15 As can be seen in hese ables he GA finds soluions ha generae superior soluions. Alhough, i is apparen ha he GA ouperforms all four BP configuraions based on RMSE, a saisical es is needed o show significan differences beween hese soluions. A saisical comparison of es resuls was conduced using he Wilcoxon mached-pairs signed ranks (2- ailed P significance) es. This es is designed o es a hypohesis abou he locaion of a populaion disribuion. I does no require he assumpion ha he populaion be normally disribued and is used in place of he one sample -es when he normaliy assumpion is quesionable. More informaion on he Wilcoxon mached-pairs signed ranks es can be found in (Conover, W. J. [1980]). The bes esimaes for boh he GA and BP were used for he es. The rouine from SPSS for Windows, sofware package was used. The GA soluions dominaed he BP soluions in every es se a he 99% level of significance. Boh he GA and he Cascor variaion of BP aemped o find opimal NN archiecures. Tables 3 and 4 illusrae he average number of hidden nodes found for each problem across all replicaions. Since he archiecures for he sandard variaion of BP was saic, here was no need for addiional comparisons. Alhough, he Cascor variaion of BP found similar opimal srucures for problems 1-5 and seven, his algorihm failed o ouperform he GA for hese problems. For problem six, which included several more irrelevan inpu variables, he Cascor algorihm included a much larger number of hidden nodes han ha of he GA. Also, since Cascor is gradien based, his algorihm, unlike he GA, was incapable of idenifying unneeded weighs in he NN soluions.

16 Table 3 - Average number of hidden nodes Random irrelevan variables Lag irrelevan variables Problems GA BPC GA BPC 1 3.9 3.7 2.8 2.1 2 3.2 2.5 3.7 1.9 3 3.5 4.4 3.5 5.7 4 3.7 3.5 3.0 2.1 5 3.4 5.4 3.8 5.2 6 2.9 15.9 3.6 13.2 7 5.0 5.6 4.1 6.3 The final comparison beween he GA and he four BP configuraions was based on CPU ime in seconds. All runs were made on a 200-MHz Penium pro worksaion using he NT 4.0 operaions sysem. Tables 4 and 5 illusrae he ime differences for hese runs. I can be seen, from hese ables ha he GA no only finds global parsimonious soluions, bu does so in an efficien manner. Alhough, he GA was premaurely erminaed, furher raining would only have resuled in beer forecass. Table 4 - Average CPU ime comparisons (random irrelevan inpu) Problem GA BPC BP3 BP4 BP5 1 52.5 155.2 135.3 134.9 135.4 2 42.6 148.0 135.3 135.8 134.8 3 47.9 159.4 133.1 135.4 135.7 4 49.3 154.0 133.2 134.8 134.3 5 135.1 165.4 135.2 137.8 135.3 6 119.8 247.7 132.3 137.4 141.9 7 146.1 398.9 132.2 138.1 142.3

17 Table 5 - Average CPU ime comparisons (lag irrelevan inpu) Problem GA BPC BP3 BP4 BP5 1 37.9 145.6 133.5 131.0 133.0 2 48.1 144.4 132.9 133.4 133.4 3 49.1 167.2 133.8 133.2 131.6 4 40.7 145.6 133.0 133.6 133.1 5 154.3 164.2 133.1 135.5 133.7 6 134.0 289.3 132.1 138.2 141.5 7 134.5 318.9 131.9 137.9 142.1 5. CONCLUSIONS Neural neworks offer researchers a highly versaile ool for esimaion. Unforunaely, in pas research his ool has been limied by using gradien echniques for opimizaion. By using an appropriae global search echnique, like he geneic algorihm, many of he limiaions of gradien echniques can be eliminaed. I has been shown in his inensive Mone Carlo sudy ha he GA was able effecively o idenify 100% of he irrelevan inpu variables for hese chaoic ime series problems. The significance of irrelevan inpu variable idenificaion is found in he addiional informaion ha i gives o researchers, as well as he improvemen of generalizaion of neural nework models. I was also shown ha compared wih BP, he GA was able o significanly find superior soluions in an efficien manner. Hopefully, hese resuls will generae ineres in fuure neural nework research ha will build upon he findings in his sudy.

18 REFERENCES: Baum, E. B., & Haussler, D. [1989]. Wha size ne gives valid generalizaion? Neural Compuaion, 1, 151-160. Burki, A. N. [1991]. Opimisaion of he archiecure of feed-forward neural nes wih hidden layers by uni eliminaion, Complex Sysems, 5, 371-380. Corell, M., Girard, B., Girard, Y., & Mangeas, M. [1993]. Time series and neural nework: A saisical mehod for weigh eliminaion, In M. Verleysen (Ed.), European Symposium on Arificial neural Neworks (157-164). Brussels: D. faco. Conover, W. J. [1980]. Pracical nonparameric saisics, 2nd ed. New York: John Wiley & Sons. Dorsey, R. E., Johnson, J. D. & Mayer, W. J. [1994]. "A geneic algorihm for he raining of feedforward neural neworks," Advances in Arificial Inelligence in Economics, Finance, and Managemen (J. D. Johnson and A. B. Whinson, eds.). (Vol.1). JAI Press Inc., Greenwich, CT, 93-111. Dorsey, R. E. & Mayer W. J. [1995]. "Geneic algorihms for esimaion problems wih muliple opima, non-differeniabiliy, and oher irregular feaures," Journal of Business and Economic Saisics, 13(1), 53-66. Dorsey, R. E. & Mayer, W. J. [1994]. "Opimizaion using geneic algorihms," Advances in Arificial Inelligence in Economics, Finance, and Managemen (J. D. Johnson and A. B. Whinson, eds.). (Vol.1). JAI Press Inc., Greenwich, CT, 69-91. Drucker, H., & LeCun, Y. [1992]. Improving generalisaion performance using double backpropagaion, IEEE ransacions on Neural Neworks, 3, 991-997. Fahlman, S. E., Lebiere, C. [1990]. "The cascade-correlaion learning archiecure," Advances in Neural Informaion Processing Sysems, (Vol. II). Morgan Kaufmann, San Maeo, CA, 524-532. Gallan, A. R., & Whie, H. [1992]. "On learning he derivaives a an unknown mapping wih mulilayer feedforward neworks," Arificial neural neworks: Approximaion and learning heory. (H. Whie, eds.) Blackwell Publishers, Cambridge, MA, 206-223. Holmsröm, L., & Koisinen, P. [1992]. Using addiive noise in backpropagaion raining, IEEE Transacion on Neural Neworks, 3, 24-38. Kamimura, R. [1993]. Inernal represenaion wih minimum enropy in recurren neural neworks: Minimizing enropy hrough inhibiory connecions, Nework Compuaion in Neural Sysems, 4, 423-440.

19 Karmin, E. D. [1990]. A simple procedure for pruning backpropagaion rained neworks, IEEE Transacion on Neural Neworks, 1, 239-242. Kruschke, J. K. [1989]. Disribued bolenecks for improved generalizaion in backpropogaion neworks, Inernaional Journal of Neural Neworks Research and Applicaions, 1, 187-193. Lee, C. W. [1997]. Training feedforward neural neworks: An algorihm giving improved generalizaion, Neural Neworks, 10(1), 61-68. Lendaris, G. G. & Harls, I. A. [1990]. Improved generalizaion in ANN s via use of concepual graphs: A characer recogniion ask as an example case, Proceedings IJCNN-90 (551-556). Piscaaway, NJ: IEEE. Monana, D. J., & Davis, L. [1989]. "Training feedforward neural neworks using geneic algorihms," Proceedings of he Third Inernaional Conference on Geneic Algorihms, Morgan Kaufmann, San Maeo, CA, 379-384. Romaniuk, S. G. [1993]. Pruning divide and conquer neworks, Nework: Compuaion in Neural Sysems, 4, 481-494. Rumelhar, D. E., & McClelland, J. L. (Eds.). [1986]. Parallel disribued processing, (Vol. 1). MIT Press, Cambridge, MA. Schaffer, J. D., Whiley, D., & Eshelman, L. J. [1992]. "Combinaions of geneic algorihms and neural neworks: A survey of he sae of he ar," COGANN-92 Combinaions of Geneic Algorihms and Neural Neworks, IEEE Compuer Sociey Press, Los Alamios, CA, 1-37. Schiffman, W., Joos, M., & Werner, R. [1993]. Comparison of opimized backpropagaion algorihms, In M. Verleysen (Ed.), European Symposium on Arificial Neural neworks, (97-104). Brussels: D. faco. Schuser, H. [1995]. Deerminisic chaos: An inroducion, VCH, Weinheim, New York. Sexon, R. S., Dorsey, R. E., & Johnson, J. D. [1998]. "Toward a global opimum for neural neworks: a comparison of he geneic algorihm and backpropagaion," Decision Suppor Sysems, 22(2), 171-186.