A SELF-LEARNING NEURAL NETWORK

Similar documents
Artificial Neural Networks written examination

Artificial Neural Networks

INPE São José dos Campos

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Methods for Fuzzy Systems

Knowledge Transfer in Deep Convolutional Neural Nets

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Softprop: Softmax Neural Network Backpropagation Learning

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

CS Machine Learning

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

A Reinforcement Learning Variant for Control Scheduling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Variation-Tolerant Multi-Level Memory Architecture Encoded in Two-state Memristors

On the Combined Behavior of Autonomous Resource Management Agents

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Seminar - Organic Computing

Probability and Statistics Curriculum Pacing Guide

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Bluetooth mlearning Applications for the Classroom of the Future

The Good Judgment Project: A large scale test of different methods of combining expert predictions

How People Learn Physics

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

4.0 CAPACITY AND UTILIZATION

Abstractions and the Brain

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Longitudinal Analysis of the Effectiveness of DCPS Teachers

An empirical study of learning speed in backpropagation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A Pipelined Approach for Iterative Software Process Model

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

Professor Christina Romer. LECTURE 24 INFLATION AND THE RETURN OF OUTPUT TO POTENTIAL April 20, 2017

Human Emotion Recognition From Speech

Testing protects against proactive interference in face name learning

The Strong Minimalist Thesis and Bounded Optimality

BENCHMARK TREND COMPARISON REPORT:

Analysis of Enzyme Kinetic Data

Rote rehearsal and spacing effects in the free recall of pure and mixed lists. By: Peter P.J.L. Verkoeijen and Peter F. Delaney

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

D Road Maps 6. A Guide to Learning System Dynamics. System Dynamics in Education Project

Test Effort Estimation Using Neural Network

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Statewide Framework Document for:

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Bluetooth mlearning Applications for the Classroom of the Future

Python Machine Learning

Calibration of Confidence Measures in Speech Recognition

Word Segmentation of Off-line Handwritten Documents

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Corpus Linguistics (L615)

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Application of Virtual Instruments (VIs) for an enhanced learning environment

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A Comparison of Annealing Techniques for Academic Course Scheduling

An Introduction to Simio for Beginners

Visual CP Representation of Knowledge

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

American Journal of Business Education October 2009 Volume 2, Number 7

SARDNET: A Self-Organizing Feature Map for Sequences

BMBF Project ROBUKOM: Robust Communication Networks

WHEN THERE IS A mismatch between the acoustic

Computer Science. Embedded systems today. Microcontroller MCR

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Evolutive Neural Net Fuzzy Filtering: Basic Description

STA 225: Introductory Statistics (CT)

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

GACE Computer Science Assessment Test at a Glance

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Further, Robert W. Lissitz, University of Maryland Huynh Huynh, University of South Carolina ADEQUATE YEARLY PROGRESS

How to Judge the Quality of an Objective Classroom Test

Arizona s College and Career Ready Standards Mathematics

Hypermnesia in free recall and cued recall

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

An Empirical and Computational Test of Linguistic Relativity

Probabilistic Latent Semantic Analysis

Learning From the Past with Experiment Databases

Speaker Identification by Comparison of Smart Methods. Abstract

Lecture 1: Machine Learning Basics

School Size and the Quality of Teaching and Learning

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

OVERVIEW OF CURRICULUM-BASED MEASUREMENT AS A GENERAL OUTCOME MEASURE

Moderator: Gary Weckman Ohio University USA

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

What effect does science club have on pupil attitudes, engagement and attainment? Dr S.J. Nolan, The Perse School, June 2014

A Case-Based Approach To Imitation Learning in Robotic Agents

Radius STEM Readiness TM

Using focal point learning to improve human machine tacit coordination

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Getting Started with Deliberate Practice

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Transcription:

769 A SELF-LEARNING NEURAL NETWORK A. Hartstein and R. H. Koch IBM - Thomas J. Watson Research Center Yorktown Heights, New York ABSTRACf We propose a new neural network structure that is compatible with silicon technology and has built-in learning capability. The thrust of this network work is a new synapse function. The synapses have the feature that the learning parameter is embodied in the thresholds of MOSFET devices and is local in character. The network is shown to be capable of learning by example as well as exhibiting the desirable features of the Hopfield type networks. The thrust of what we want to discuss is a new synapse function for an artificial neuron to be used in a neural network. We choose the synapse function to be readily implementable in VLSI technology, rather than choosing a function which is either our best guess for the function used by real synapses or mathematically the most tractable. In order to demonstrate that this type of synapse function provides interesting behavior in a neural network, we imbed this type of function in a Hopfield {Hopfield, 1982} type network and provide the synapses with a Hebbian {Hebb, 1949} learning capability. We then show that this type of network functions in much the same way as a Hopfield network and also learns by example. Some of this work has been discussed previously {Hartstein, 1988}. Most neural networks, which have been described, use a multiplicative function for the synapses. The inputs to the neuron are multiplied by weighting factors and then the results are summed in the neuron. The result of the sum is then put into a hard threshold device or a device with a sigmoid output. This is not the easiest function for a MOSFET to perform although it can be done. Over a large range of parameters, a MOSFET is a linear device with the output current being a linear function of the input voltage relative to a threshold voltage. If one could directly utilize these characteristics, one would be able to design a neural network more compactly.

77 Hartstein and Koch We propose that we directly use MOSFETs as the input devices for the neurons in the network, utilizing their natural characteristics. We assume the following form for the input of each neuron in our network: V; = ( 2: IIj - T;j I ) 1 where V, is the output, are the inputs and T,j are the learned threshold voltages. In this network we use a representation in which both the V's and the T's range from to + 1. The result of the summation is fed into a non-linear sigmoid function (). All of the neurons in the network are interconnected, the outputs of each neuron feeding the inputs of every other neuron. The functional form of Eq. 1 might, for instance, represent several n-channel and p-channel MOSFETs in parallel. The memories in this network are contained in the threshold voltages, 1',}" We implement learning in this network using a simple linear Hebbian {Hebb, 1949} learning rule. We use a rule which locally reinforces the state of each input node in a neuron relative to the output of that neuron. The equation governing this learning algorithm is: where 1';j are the initial threshold voltages and T 'j are the new threshold voltages after a time,.6.t. Here TJ is a small learning parameter related to this time period, and the offset factor O.S is needed for symmetry. Additional saturation constraints are imposed to ensure that 1';j remain in the interval to + 1. This learning rule is one which is linear in the difference between each input and output of a neuron. This is an enhancing/inhibiting rule. The thresholds are adjusted in such a way that the output of the neuron is either pushed in the same direction as the input (enhancing), or pushed in the opposite direction (inhibiting). For our simple simulations we started the network with all thresholds at O.S and let learning proceed until some saturation occurred. The somewhat more sophisticated method of including a relaxation term in Eq. 2 to slowly push the values toward O.S over time was also explored. The results are essentially the same as for our simple simulations. The interesting question is if we form a network using this type of neuron, what will the overall network response be like? Will the network learn multiple states or will it learn a simple average over all of the states it sees? In order to probe the functioning of this network, we have performed simulations of this network on a digital computer. Each simulation was divided into two phases. The first was a learning phase in which a fixed number of random patterns were presented to the network sequentially for some period of time. During this phase the threshold (1) (2)

A Self-Learning Neural Network 771 voltages were allowed to change using the rule in Eq. 2. The second was a testing phase in which learning was turned off and the memories established in the network were probed to determine the essential features of these learned memories. In this way we could test how well the network was able to learn the initial test patterns, how well the network could reconstruct the learned patterns when presented with test patterns containing errors, and how the network responded to random input patterns. We have simulated this network using N fully interconnected neurons, with N in the range of 1 to 2. M random patterns were chosen and sequentially presented to the network for learning. M typically ranged up to N/3. After the learning phase, the nature of the stable states in the network was tested. In general we found that the network is capable of learning all of the input patterns as long as M is not too large. The network also learns the inverse patterns (l's and O's interchanged) due to the inherent symmetry of the network. Additional extraneous patterns are learned which have no obvious connection to the intended learned states. These may be analogous to either the spin glass states or the mixed pattern states discussed for the multiplicative network {Amit, 1985}. Fig. 1 shows the capacity of a 1 neuron network. We attempted to teach the network M states and then probed the network to see how many of the states were successfully learned. This process was repeated many times until we achieved good statistics. We have defined successful learning as 1;6 accuracy. A more relaxed definition would yield a qualitatively similar curve with larger capacity. The functional form of the learning is peaked at a fixed value of the number of input patterns. For a small number of input patterns, the network essentially learns all of the patterns. Deviations from perfect learning here generally mean 1 bit of information was learned incorrectly. Near the peak the results become more noisy for different learning attempts. Most errors are still only 1 or 2 bits! but the learning in this region becomes marginal as the capacity of the network is approached. For larger values of the number of input patterns the network becomes overloaded and it becomes incapable of learning most of the input states. Some small number of patterns are still learned, but the network is clearly not functioning well. Many of the errors in this region are large, showing little correlation with the intended learned states. This functional form for the learning in the network is the same for all of the network sizes tested. We define the capacity of the network as the average value of the peak number of patterns which can be successfully learned. The inset to Fig. 1 shows the memory capacity of a number of tested networks as a function of the size of the network. The network capacity is seen to be a linear function of the network size. The capacity is proportional to the number of T./s specified. In this

772 Hartstein and Koch example the network capacity was f ouod to be about 8 1 of the maximum possible for binary information. This rather low figure results from a trade-off of capacity for the partic\jlar types of functions that a neural network can perform. It is possible to construct simple memories with 1.,.6 capacity. N 1 2 25-------------------------------------- 2 ] 2 15 E o ' 1 E z 5,,,'. 1.?;- ' C.. U o-------------- o 1 2 3 4 5 Figure 1. The number of successfully learned patterns as a function of the number of input patterns for a 1 neuron network. The dashed curve is for perfect learning. The inset shows the memory capacity of a threshold neural network as a function of the size of the network. Some important measures of learning in the network are the distribution of stable states in the network after learning has taken place. and the basin of attraction r or each stable point. One can gain a handle on these parameters by probing the network with random test patterns after the network has learned M states. Fig. 2 shows the averaged results of such tests for a 1 neuron network and varying numbers of learned states. The figure shows the probability of finding particular states. both learned and extraneous. The states are ordered first by decreasing

A Self-Learning Neural Network 773 probability for the learned states, followed by decreasing probability for the extraneous states. It is clear from the figure that both types of stable states are present in the network. It is also clear that the probabilities of finding different patterns are not equal. Some learned states are more robust than others, that is they have larger basins of attraction. This network model does not partition the available memory space equally among the input patterns. It also provides a large amount of memory space for the extraneous states. Clearly, this is not the optimum situation..8.6 -.s (/) ').2 Q).4 c: c: G:....8.b :.6 e.4 Learned L Extraneous. ---.2 Learned Extraneous (a) (b). 5 1 15 2 25 3 State Figure 2. The probability of the network finding a specific pattern. Both learned states and extraneous states are found. The figure was obtained for a 1 neuron network. Fig. 2a is for 5 learned patterns and 2b is for 1 learned patterns. Some of the learned states appear to have probability of being found in this simulation. Some of these states are not stable states of the network and will never be found. This is particularly true-when the number of learned states is close to or exceeds the capacity of the network. Others of these states simply have an extremely small probability of being found in a random search because they have small basins of attraction. However, as discussed below, these are still viable states. When the network learns fewer states than its capacity (Fig. 2a),

774 Hartstein and Koch most of the stable states are the learned states. As the capacity is approached or exceeded, most of the stable states are extraneous states. The results shown in Fig. 2 address the question of the networks tolerance to errors. A pattern, which has a large basin of attraction, will be relatively tolerant to errors when being retrieved, whereas, a pattern, which has a small basin of attraction, will be less tolerant of errors. The immunity of the learned patterns to errors in being retrieved can also be tested in a more direct way. One can probe the network with test patterns which start out as the learned patterns, but have a certain number of bits changed randomly. One then monitors the final pattern which the networks finds and compares to the known learned pattern..$.s (I) "i.8 E o.6!3 t7' c: ;:; c: 4 '.2 e... '------.1--.-.-.--.....-.4... o 1 2 3 4 Hamming Distance Figure 3. Probability of the network finding a specific learned state when the input pattern has a certain Hamming distance. This figure was obtained for a 1 neuron network which was taught 1 random patterns. Fig. 3 shows typical results of such a calculation. The probability of successfully retrieving a pattern is shown as a function of the Hamming distance. the number of bits which were randomly changed in the test pattern. For this simulation a too neuron network was used and it was taught 1 patterns. For small Hamming distances the patterns are successfully found 1,.6 of the time. As the Hamming distance gets larger the network is no longer capable of finding the desired pattern. but rather finds one of the other fixed points. This result is a statistical av-

A Self-Learning Neural Network 775 erase over all of the states and therefore tends to emphasize patterns with small basins of attraction. This is just the opposite of the types of states emphasized in the analysis shown in Fig. 2. We can define the maximum Hamming distance as the Hamming distance at which the probability of finding the learned state has dropped to SO%. Fig. 4 shows the maximum Hamming distance as a function of the number of learned states in our 1 neuron network. As one expects the maximum Hamming distance gets smaller as the number of learned states increases. Perhaps surprisingly, the relationship is linear. These results are important since one requires a reasonable maximum Hamming distance for any real system to function. These considerations also shed some light on the nature of the functioning of the network and its ability to learn. 6 CI) (.) c i5 4 c e E :c 2. E.- ::E 5 1 15 2 M Figure 4. The maximum Hamming distance for a given number of learned states. Results are for a 1 neuron network. This simulation gives us a picture of the way in which the network utilizes its phase space to store information. When only a few patterns are stored in the network, the network divides up the available space among these memories. The learning process is almost always successful. When a larger number of learn patterns are attempted, the available space is now divided among more memories. The maximum Hamming distance decreases and more space is taken up byextraneous states. When the memory capacity is exceeded, the phase space allo-

776 Hartstein and Koch cated to any successful memory is very small and most of the space is taken up by extraneous states. The types of behavior we have described are similar to those found in the Hopfield type memory utilizing multiplicative synapses. In fact our central point is that by using a completely different type of synapse function, we can obtain the same behavior. At the same time we argue since this network was proposed using a synapse function which mirrors the operating characteristics of MOSFETs, it will be much easier to realize in hardware. Therefore, we should be able to construct a smaller more tolerant network with the same operating characteristics. We do not mean to imply that the type of synapse function we have explored can only be used in a Hopfield type network. In fact we feel that this type of neuron is quite general and can successfully be utilized in any type of network. This is at present just a conjecture which needs to be explored more fully. Perhaps the most important message from our work is the realization that one need not be constrained to the multiplicative type of synapse, and that other forms of synapses can perform similar functions in neural networks. This may open up many new avenues of investigation. REFERENCES D.l. Amit, H. Gutfreund and H. Sompolinsky, Phys. Rev. A32, 11 (1985). A. Hartstein and R.H. Koch, IEEE Int. Conf. on Neural Networks, (SOS Printing, San Diego, 1988), Vol. I, 425. D. Hebb, The Organization of Behaviour, (Wiley, New York, 1949). 1.1. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 (1982).