EVOLUTION AND LEARNING IN NEURAL NETWORKS: THE NUMBER AND DISTRIBUTION OF LEARNING TRIALS AFFECT THE RATE OF EVOLUTION

Similar documents
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evolution of Symbolisation in Chimpanzees and Neural Nets

The dilemma of Saussurean communication

Artificial Neural Networks written examination

Python Machine Learning

Lecture 1: Machine Learning Basics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Knowledge-Based - Systems

INPE São José dos Campos

Reinforcement Learning by Comparing Immediate Reward

Cooperative evolutive concept learning: an empirical study

While you are waiting... socrative.com, room number SIMLANG2016

An empirical study of learning speed in backpropagation

Laboratorio di Intelligenza Artificiale e Robotica

A Reinforcement Learning Variant for Control Scheduling

Abstractions and the Brain

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Word Segmentation of Off-line Handwritten Documents

Knowledge Transfer in Deep Convolutional Neural Nets

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Learning Methods for Fuzzy Systems

Generative models and adversarial training

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Human Emotion Recognition From Speech

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Degeneracy results in canalisation of language structure: A computational model of word learning

Softprop: Softmax Neural Network Backpropagation Learning

Axiom 2013 Team Description Paper

Evolutive Neural Net Fuzzy Filtering: Basic Description

A Pipelined Approach for Iterative Software Process Model

Learning to Schedule Straight-Line Code

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

Biological Sciences, BS and BA

Laboratorio di Intelligenza Artificiale e Robotica

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Discriminative Learning of Beam-Search Heuristics for Planning

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

(Sub)Gradient Descent

An Empirical and Computational Test of Linguistic Relativity

Using Genetic Algorithms and Decision Trees for a posteriori Analysis and Evaluation of Tutoring Practices based on Student Failure Models

Artificial Neural Networks

How to Judge the Quality of an Objective Classroom Test

SARDNET: A Self-Organizing Feature Map for Sequences

Using focal point learning to improve human machine tacit coordination

CS Machine Learning

Software Maintenance

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Research Article Hybrid Multistarting GA-Tabu Search Method for the Placement of BtB Converters for Korean Metropolitan Ring Grid

A student diagnosing and evaluation system for laboratory-based academic exercises

LEGO MINDSTORMS Education EV3 Coding Activities

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

CSC200: Lecture 4. Allan Borodin

Lecture 10: Reinforcement Learning

Ordered Incremental Training with Genetic Algorithms

Biology 1 General Biology, Lecture Sections: 47231, and Fall 2017

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

On the Combined Behavior of Autonomous Resource Management Agents

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Calibration of Confidence Measures in Speech Recognition

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

An Introduction to the Minimalist Program

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Guide to Teaching Computer Science

Inside the mind of a learner

Lecture 2: Quantifiers and Approximation

Probabilistic principles in unsupervised learning of visual structure: human data and a model

Sugar And Salt Solutions Phet Simulation Packet

A Comparison of Annealing Techniques for Academic Course Scheduling

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

How the Guppy Got its Spots:

Test Effort Estimation Using Neural Network

Time series prediction

DEVELOPING AN INTERACTIVE METHOD TO MAP THE STUDENT PERSPECTIVES ON EVOLUTION

Classification Using ANN: A Review

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

A pilot study on the impact of an online writing tool used by first year science students

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Constructing a support system for self-learning playing the piano at the beginning stage

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Circuit Simulators: A Revolutionary E-Learning Platform

BIOS 104 Biology for Non-Science Majors Spring 2016 CRN Course Syllabus

arxiv: v1 [cs.lg] 15 Jun 2015

2017 Florence, Italty Conference Abstract

Age Effects on Syntactic Control in. Second Language Learning

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

GUIDELINES FOR HUMAN GENETICS

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Transcription:

EVOLUTION AND LEARNING IN NEURAL NETWORKS: THE NUMBER AND DISTRIBUTION OF LEARNING TRIALS AFFECT THE RATE OF EVOLUTION Ron Keesing and David G. Stork* Ricoh California Research Center and *Dept. of Electrical Engineering 2882 Sand Hill Road Suite 115 Stanford University Menlo Park, CA 9425 Stanford, CA 9435 stork@crc.ricoh.com stork@psych.stanford.edu Abstract Learning can increase the rate of evolution of a population of biological organisms (the Baldwin effect). Our simulations show that in a population of artificial neural networks solving a pattern recognition problem, no learning or too much learning leads to slow evolution of the genes whereas an intermediate amount is optimal. Moreover, for a given total number of training presentations, fastest evoution occurs if different individuals within each generation receive different numbers of presentations, rather than equal numbers. Because genetic algorithms (GAs) help avoid local minima in energy functions, our hybrid learning-ga systems can be applied successfully to complex, highdimensional pattern recognition problems. INTRODUCTION The structure and function of a biological network derives from both its evolutionary precursors and real-time learning. Genes specify (through development) coarse attributes of a neural system, which are then refined based on experience in an environment containing more information - and 84

Evolution and Learning in Neural Networks 85 more unexpected infonnation - than the genes alone can represent. Innate neural structure is essential for many high level problems such as scene analysis and language [Chomsky, 1957]. Although the Central Dogma of molecular genetics [Crick, 197] implies that information learned cannot be directly transcribed to the genes, such information can appear in the genes through an indirect Darwinian process (see below). As such, learning can change the rate of evolution - the Baldwin effect [Baldwin, 1896]. Hinton and Nowlan [1987] considered a closely related process in artificial neural networks, though they used stochastic search and not learning per se. We present here analyses and simulations of a hybrid evolutionary-learning system which uses gradientdescent learning as well as a genetic algorithm, to determine network connections. Consider a population of networks for pattern recognition, where initial synaptic weights (weights "at birth") are detennined by genes. Figure 1 shows the Darwinian fitness of networks (i.e., how many patterns each can correctly classify) as a function the weights. Iso-fitness contours are not concentric, in general. The tails of the arrows represent the synaptic weights of networks at birth. In the case of evolution without learning, network B has a higher fitness than does A, and thus would be preferentially selected. In the case of gradient-descent learning before selection, however, network A has a higher after-learning fitness, and would be preferentially selected (tips of arrows). Thus learning can change which individuals will be selected and reproduce, in particular favoring a network (here, A) whose genome is "good" (i.e., initial weights "close" to the optimal), despite its poor performance at birth. Over many generations, the choice of "better" genes for reproduction leads to new networks which require less learning to solve the problem - they are closer to the optimal. The rate of gene evolution is increased by learning (the Baldwin effect). Iso-fitness contours A Weight 1 Figure 1: Iso-fitness contours in synaptic weight space. The black region corresponds to perfect classifications (fitness = 5). The weights of two networks are shown at birth (tails of arrows), and after learning (tips of arrows). At birth, 8 has a higher fitness score (2) than does A (1); a pure genetic algorithm (without learning) would preferentially reproduce 8. Wit h learning, though, A has a higher fitness score (4) than 8 (2), and would thus be preferentially reproduced. Since A's genes are "better" than 8's, learning can lead to selection of better genes.

86 Keesing and Stork Surprisingly, too much learning leads to slow evolution of the genome, since after sufficient training in each generation, all networks can perform perfectly on the pattern recognition task, and thus are equally likely to pass on their genes, regardless of whether they are "good" or "bad." In Figure 1, if both A and B continue learning, eventually both will identify all five patterns correctly. B will be just as likely to reproduce as A, even though A's genes are "better." Thus the rate of evolution will be decreased - too much learning is worse than an intermediate amount - or even no - learning. SIMULA TION APPROACH Our system consists of a population of 2 networks, each for classifying pixel images of the first five letters of the alphabet. The 9 x 9 input grid is connected to four 7 x 7 sets of overlapping 3 x 3 orientation detectors; each detector is fully connected by modifiable weights to an output layer containing five category units (Fig. 2). trainable weights B V'J Q).~... C ~ =' ~ Q) fully interconnected... ~ E u.~ ((,~ ~~~~ Figure 2: Individual network architecture. The 9x9 pixel input is detected by each of four orientation selective input layers (7x7 unit arrays), which are fully connected by trainable weights to the five category units. The network is thus a simple perceptron with 196 (=4x7x7) inputs and 5 outputs. Genes specify the initial connection strengths. Each network has a 49-bit gene specifying the initial weights (Figure 3). For each of the 49 filter positions and 5 categories, the gene has two bits A OJ)

Evolution and Learning in Neural Networks 87 which specify which orientation is initially most strongly connected to the category unit (by an arbitrarily chosen factor of 3:1). During training, the weights from the filters to the output layer are changed by (supervised) perceptron learning. Darwinian fitness is given by the number of patterns correctly classified after training. We use fitness-proportional reproduction and the standard genetic algorithm processes of replication, mutation, and cross-over [Holland, 1975]. Note that while fitness may be measured after training, reproduction is of the genes present at birth, in accord with the Central Dogma. This is llil1 a Lamarkian process. A detector B detector C detector D detector.... pit 111111 11 1111 111 11 1 1111 11 11 1111 1 1 1 1 11... 111111111111111111111111... ~ possible gene Relative ~@IJ ~ QI]~ values at initial 3 1 1 1 ~ a spatial weights position between filters (at a 1 3 1 1 ~ spatial position) and 1 1 3 1 ~ category 1 1 1 3 ~ Figure 3: The genetic representation of a network. For each of the five category units, 49 two-bit numbers describe which of the four orientation units is most strongly connected at each position within the 7x7 grid. This unit is given a relative connection strength of 3, while the other three orientation units at that position are given a relative strength of 1. For a given total number of teaching presentations, reproductive fitness might be defined in many ways, including categorization score at the end of learning or during learning; such functions will lead to different rates of evolution. We show simulations for two schemes: in uniform learning each network received the same number (e.g., 2) of training presentations; in

88 Keesing and Stork distributed learning networks received a randomly chosen number (1, 34, 36, 16, etc.) of presentations. RESULTS AND DISCUSSION Figure 4 shows the population average fitness at birth. The lower curve shows the performance of the genetic algorithm alone; the two upper curves represent genetypic evolution - the amount of information within the genes - when the genetic algorithm is combined with gradient-descent learning. Learning increases the rate of evolution - both uniform and distributed learning are significantly better than no learning. The fitness after learning in a generation (not shown) is typically only 5% higher than the fitness at birth. Such a small improvement at a single generation cannot account for the overall high performance at later generations. A network's performance - even after learning - is more dependent upon its ancestors having learned than upon its having learned the task. Pop. Avg. Fitness at Birth for Different Learning Schemes =... S~--------------------~ m - 4 as 3 CD C!:: u... 2 C) > cr:: 1. D.. D.. 2 4 6 8 1 Generation Figure 4: Learning guides the rate of evolution. In uniform learning, every network in every generation receives 2 learning presentations; in the distributed learning scheme, any network receives a number of patterns randomly chosen between and 4 presentations (mean = 2). Clearly, evolution with learning leads to superior genes (fitness at birth) than evolution without learning. Ave. Fitness at Generation 1 Depends on Amount of Training.cs-r--------------. t: _ 4 as CD C 3 - u.. 2 ai > cr:: 1 D.. Q. 1 Avg. Distributed Learning 1 1 1 Learning Trials per Indlv. Figure 5: Selectivity of learningevolution interactions. Too little or too much learning leads to slow evolution (population fitness at birth at generation 1) while an intermediate amount of learning leads to significantly higher such fitness. This effect is significant in both learning schemes. (Each point represents the mean of five simulation runs.)

Evolution and Learning in Neural Networks 89 Figure 5 illustrates the tuning of these learning-evolution interactions, as discussed above: too little or too much learning leads to poorer evolution than does an intermediate amount of learning. Given excessive learning (e.g., 5 presentations) all networks perform perfectly. This leads to the slowest evolution, since selection is independent of the quality of the genes. Note too in Fig. 4 that distributed learning leads to significantly faster evolution (higher fitness at any particular generation) than uniform learning. In the uniform learning scheme, once networks have evolved to a point in weight space where they (and their offspring) can identify a pattern after learning, there is no more "pressure" on the genes to evolve. In Figure 6, both A and B are able to identify three patterns correctly after uniform learning, and hence both will reproduce equally. However, in the distributed learning scheme, one of the networks may (randomly) receive a small amount of learning. In such cases, A's reproductive fitness will be unaffected, because it is able to solve the patterns without learning, while B's fitness will decrease significantly. Thus in the distributed learning scheme (and in schemes in which fitness is determined in part during learning), there is "pressure" on the genes to improve at every generation. Diversity is, a driving force for evolution. Our distributed learning scheme leads to a greater diversity of fitness throughout a population. Iso-fitness contours Figure 6: Distributed learning leads to faster evolution than uniform learning. In uniform learning, (shown above) A and B have equal reproductive fitness, even though A has "better" genes. In distributed learning, A will be more likely to reproduce when it (randomly) receives a small amount of learning (shorter arrow) than B will under similar circumstances. Thus "better" genes will be more likely to reproduce, leading to faster evolution. Weight 1 CONCLUSIONS Evolutionary search via genetic algorithms is a powerful technique for avoiding local minima in complicated energy landscapes [Goldberg, 1989; Peterson, 199], but is often slow to converge in large problems. Conventional genetic approaches consider only the reproductive fitness of

81 Keesing and Stork the genes; the slope of the fitness landscape in the immediate vicinity of the genes is ignored. Our hybrid evolutionary-learning approach utilizes the gradient of the local fitness landscape, along with the fitness of the genes, in detennining survival and reproduction. We have shown that this technique offers advantages over evolutionary search alone in the single-minimum landscape given by perceptron learning. In a simple pattern recognition problem, the hybrid system performs twice as well as a genetic algorithm alone. A hybrid system with distributed learning, which increases the "pressure" on the genes to evolve at every generation, performs four times as well as a genetic algorithm. In addition, we have demonstrated that there exists an optimal average amount of learning in order to increase the rate of evolution - too little or too much learning leads to slower evolution. In the extreme case of too much learning, where all networks are trained to perfect performance, there is no improvement of the genes. The advantages of the hybrid approach in landscapes with multiple minima can be even more pronounced [Stork and Keesing, 1991]. Acknowledgments Thanks to David Rumelhart, Marcus Feldman, and Aviv Bergman for useful discussions. References Baldwin, J. M. "A new factor in evolution," American Naturalist 3,441-451 (1896) Chomsky, N. Syntactic Structures The Hague: Mouton (1957) Crick, F. W. "Central Dogma of Molecular Biology," Nature 227, 561-563 (197) Goldberg, D. E. Genetic Algorithms in Search, Optimization & Machine Learning Reading, MA: Addison-Wesley (1989). Hinton, G. E. and Nowlan, S. 1. "How learning can guide evolution," Complex Systems 1,495-52 (1987) Holland, J. H. Adaptation in Natural and Artificial Systems University of Michigan Press (1975) Peterson, C. "Parallel Distributed Approaches to Combinatorial Optimization: Benchmanrk Studies on Traveling Salesman Problem," Neural Computation 2, 261-269 (199). Stork, D. G. and Keesing, R. "The distribution of learning trials affects evolution in neural networks" (1991, submitted).