OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Similar documents
Python Machine Learning

Statewide Framework Document for:

Using focal point learning to improve human machine tacit coordination

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

INPE São José dos Campos

Assignment 1: Predicting Amazon Review Ratings

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Extending Place Value with Whole Numbers to 1,000,000

Grade 6: Correlated to AGS Basic Math Skills

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Generative models and adversarial training

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Knowledge Transfer in Deep Convolutional Neural Nets

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A Neural Network GUI Tested on Text-To-Phoneme Mapping

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Analysis of Enzyme Kinetic Data

Evolution of Symbolisation in Chimpanzees and Neural Nets

On the Combined Behavior of Autonomous Resource Management Agents

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Physics 270: Experimental Physics

Houghton Mifflin Online Assessment System Walkthrough Guide

Circuit Simulators: A Revolutionary E-Learning Platform

Mathematics Scoring Guide for Sample Test 2005

Rendezvous with Comet Halley Next Generation of Science Standards

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Australian Journal of Basic and Applied Sciences

Interpreting ACER Test Results

SARDNET: A Self-Organizing Feature Map for Sequences

Human Emotion Recognition From Speech

Evolutive Neural Net Fuzzy Filtering: Basic Description

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

arxiv: v1 [math.at] 10 Jan 2016

Are You Ready? Simplify Fractions

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

UNIT ONE Tools of Algebra

Chapter 4 - Fractions

Forget catastrophic forgetting: AI that learns after deployment

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Axiom 2013 Team Description Paper

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

learning collegiate assessment]

Rule Learning With Negation: Issues Regarding Effectiveness

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Why Did My Detector Do That?!

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

GCE. Mathematics (MEI) Mark Scheme for June Advanced Subsidiary GCE Unit 4766: Statistics 1. Oxford Cambridge and RSA Examinations

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

LEGO MINDSTORMS Education EV3 Coding Activities

Rule Learning with Negation: Issues Regarding Effectiveness

(Sub)Gradient Descent

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

On-the-Fly Customization of Automated Essay Scoring

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

CHMB16H3 TECHNIQUES IN ANALYTICAL CHEMISTRY

Multiplication of 2 and 3 digit numbers Multiply and SHOW WORK. EXAMPLE. Now try these on your own! Remember to show all work neatly!

WHEN THERE IS A mismatch between the acoustic

CS Machine Learning

Probability and Statistics Curriculum Pacing Guide

Linking Task: Identifying authors and book titles in verbose queries

Mathematics process categories

A Pipelined Approach for Iterative Software Process Model

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A Case Study: News Classification Based on Term Frequency

Word Segmentation of Off-line Handwritten Documents

Test Effort Estimation Using Neural Network

Multimedia Application Effective Support of Education

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

Miami-Dade County Public Schools

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Speaker Identification by Comparison of Smart Methods. Abstract

Writing Research Articles

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Grades. From Your Friends at The MAILBOX

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Abstractions and the Brain

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Backwards Numbers: A Study of Place Value. Catherine Perez

Lecture 10: Reinforcement Learning

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Artificial Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Learning From the Past with Experiment Databases

Functional Skills Mathematics Level 2 assessment

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Constructing a support system for self-learning playing the piano at the beginning stage

How the Guppy Got its Spots:

MTH 215: Introduction to Linear Algebra

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Transcription:

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7, 701 03 Ostrava 1 Czech Republic vaclav.kocian@osu.cz Abstract: The article deals with possibilities of optimization of classifiers based on neural networks which use Hebbian learning mechanism. The experimental study was conducted. The study shows, that badly designed learning patterns can prevent the network from learning under certain circumstances. The new term of irrelevant items of input vectors has been introduced in the article. Also we have introduced a optimization method. This method helps to avoid problems caused by so-called irrelevant items of input vectors and thus makes the learning algorithm more robust. The method lays off the self classifying algorithm. Thanks to the fact it is very easy to equip any arbitrary algorithm with it. Keywords: Neural networks, Hebbian learning, irrelevant items, patterns optimization, pattern preprocessing 1 Hebbian Networks Hebbian learning theory can be summarized in the following rule: "Cells that fire together, wire together. [2] The rule seeks to explain "associative learning", in which simultaneous activation of cells leads to strengthening their links. The main advantage of Hebbian algorithm is its simplicity and thus its speed. The basic variant of the algorithm only needs the operations of addition and multiplication of integers. In addition, we can consider as an advance the repeatability of the calculation (calculations in the Hebbian algorithm are not burdened with randomness). This allows relatively easy to study the behavior on specific training sets. In addition, there is a possibility that discovered regularities will be applicable for some other types of networks. For a description of the learning process, we consider the trivial model network with one input and one output neuron connected with a single connection (see Fig. 1). In complex networks, these rules apply to all such triplets (input, output, connection). Neural networks are taught in so-called cycles. During each such cycle, all the training patterns are presented to the network one time. We derive formulas for calculating the value of weight w after the submission of the n-th pattern: At the start, all weights w are initialized by value of I ( I =0 according to [3] ): w n = I, n=0. After the presentation of each (the n-th) pattern, the current value of w is raised by the product of the appropriate input and output: w n =w n 1 x n y n, n 0. Therefore we can express the weight value w at the end of the first cycle, e.g. after a presentation of the m patterns: means a change of the w after one cycle. m w m = I i=1 m x i y i, where the expression x i y i i =1 Since the set of patterns presented to the network in each cycle is always the same, we can label the sum as C (e.g. change of the w after one learning cycle). The weight value at the end of the first cycle can be then written as w 1 =I C. To calculate the value of w after a p-th cycle, we can use the expression: w p =I pc.

Fig. 1: Trivial neural network considered in description of the learning process Fig. 2: General topology of classifier. Weights of connections w11-wij are modified in accordance with the Hebbian learning rule. 2 The original experimental study - motivation We noticed an unexpected behavior of classifier during our experiments with adaptation aimed at pattern recognition in time series [1]. It inspired us to study the influence of learning patterns shape on ability of neural network to adapt properly. The aim of our original experimental work was to test the ability of Hebbian networks to learn the fundamental trends from typical time series (rising, descent, resistance, support). We used two sets of artificially generated patterns. Both sets P1 (see Fig. 3, Table 1) and P2 (see Fig. 4, Table 2) contain patterns with the same meaning but derived from original data using different methods of binarization. Patterns bitmaps (bit arrays of size 8x8) were converted into onedimensional vectors with a length of 64 bits by concatenation of successive rows of the bitmap matrix. Each of output vectors T with 4 bits had only one of the bits active, which determined the number of the class assigned to the pattern, i.e. 1 - Rising 2 - Descent; 3 - Resistance and 4 Support. Fig. 3: Patterns from set P1 - input patterns bitmaps (lower square) and required responses (upper rectangle) Fig. 4: Patterns from set P2 - input patterns bitmaps (lower square) and required responses (upper rectangle) Table 1: P1 - vectors T and S. Values of -1 are written using the character '-' and values of +1 are written using the Pat T S 1 +--- -------+ ------+- -----+-- ----+--- ---+---- --+----- -+------ +------- 2 -+-- +------- -+------ --+----- ---+---- ----+--- -----+-- ------+- -------+ 3 --+- ---++--- ---++--- --+--+-- --+--+-- -+----+- -+----+- +------+ +------+ 4 ---+ +------+ +------+ -+----+- -+----+- --+--+-- --+--+-- ---++--- ---++--- Table 2: P2 - vectors T and S. Values of -1 are written using the character '-' and values of +1 are written using the Pat. T S 1 +--- -------+ ------++ -----+++ ----++++ ---+++++ --++++++ -+++++++ ++++++++ 2 -+-- ++++++++ -+++++++ --++++++ ---+++++ ----++++ -----+++ ------++ -------+ 3 --+- ---++--- ---++--- --++++-- --++++-- -++++++- -++++++- ++++++++ ++++++++ 4 ---+ ++++++++ ++++++++ -++++++- -++++++- --++++-- --++++-- ---++--- ---++--- 2.1 The original experiment procedure First, patterns from the set P1 (Fig. 3, Table 1) were presented to the network. The network was able to recognize only two of four submitted patterns. Then, patterns from the set P2 ( Fig. 4, Table 2) were presented to the network. The network was able to learn all patterns correctly.

Finally, patterns from the set P1 were presented (in active mode) to the network, which was adapted to P2. Then the network was able to classify all patterns from the set P1 correctly. The original motivation for creating the set P2 was to verify the assumption, that the presentation of "flat" patterns allows the network to obtain more general "knowledge" about the nature of patterns. Then, such network is more able to detect sequences with a lower amplitude or other slope of the curve. The experimental study seemed to confirm a correctness of this assumption. Moreover, if the "correct" patterns are presented to the network, it can learn to recognize patterns, which were even impossible to learn separately. 3 Projection of the problem into simpler patterns When analyzing the behavior of the described above, we had to repeat our experimental study with simpler patterns. We created two sets R1 (Fig. 5, Table 3) and R2 (Fig. 6, Table 4). Each of them contains four patterns. Input pattern's length is 6 and output pattern s length is 4. The behavior of network working with these two patterns was analogous to the original experiment. First, the network was not able to learn set R1. When the network was adapted with R2, then it was able to correctly classify all patterns from R2 and R1. Fig. 5: Set R1. Network is not able to learn it Table 3: Set R1, vectors T and S. Values of -1 are written using the character '-' and values of +1 are written using the Pat. T S 1 +--- +----- 2 -+-- -+---- 3 --+- --+--- 4 ---+ ---+-- Fig. 6: Set R2. Once the network learns patterns from R2, it can also classify patterns from R1. Table 4: Set R2, vectors T and S. Values of -1 are written using the character '-' and values of +1 are written using the Pat. T S 1 +--- +---+- 2 -+-- -+--+- 3 --+- --+--+ 4 ---+ ---+-+ Patterns in both sets R1 and R2 differ only in the values of the 5-th and 6-th input items. While values of these items are the same in all patterns of R1, these values differ in patterns of R2. Looking more carefully at both sets R1 and R2, we can see that outputs just "copy" the first four inputs, regardless of the value of 5-th and 6-th input items. We can intuitively say, that the 5-th and the 6-th item are both irrelevant. 3.1 Adaptation The network topology which we used is shown in Fig. 7. For each set of R1 and R2, a separate instance of the classifier was created. The Table 5 shows the network adaptation during the first learning cycle. Since patterns in R1 and R2 differ only in the 5-th and 6-th input bit, the first six columns of Table 5 are the same for both patterns. Columns 7 and 8 show the values for the 5-th and 6-th item from R1. Columns 9 and 10 show the values for the 5-th and 6-th item from R2. The closing rows of Table 5 show weight values after the first learning cycle for both sets R1 and R2. We can pronounce the following: 1. A total of twelve connections end the adaptation with zero weight values. Such connections can be considered as redundant in terms of network capacity to remember or recognize patterns. 2. Every connection related to 5-th and 6-th items have a non-zero values, i.e. they affect the work of classifiers during adaptive and active modes. 3. All connections weights W11,W22, W33 a W44 have the same value 4. 4. All connections weights Wb1-Wb4 (bias) have the same value -2.

For better illustration, we present the structure of neural network without connections with zero weight value (marked as redundant) in Fig. 8. Between the adaptation to R1 and R2, the difference is only in the weight values on connections related to the 5-th and 6-th inputs. Table 5: Evolution of weight values during learning process on sets R1 and R2. Items of input and output vectors that take a positive value are highlighted in black. Set R1,R2 Set R1 Set R2 Initialization Y1 wb1=0 w11=0 w21=0 w31=0 w41=0 w51=0 w61=0 w51=0 w61=0 Y2 wb2=0 w12=0 w22=0 w32=0 w42=0 w52=0 w62=0 w52=0 w62=0 Y3 wb3=0 w13=0 w23=0 w33=0 w43=0 w53=0 w63=0 w53=0 w63=0 Y4 wb4=0 w14=0 w24=0 w34=0 w44=0 w54=0 w64=0 w54=0 w64=0 1. Step Y1 1 1-1 -1-1 -1-1 1-1 Y2-1 -1 1 1 1 1 1-1 1 Y3-1 -1 1 1 1 1 1-1 1 Y4-1 -1 1 1 1 1 1-1 1 2. Step Y1 0 2-2 0 0 0 0 0 0 Y2 0-2 2 0 0 0 0 0 0 Y3-2 0 0 2 2 2 2-2 2 Y4-2 0 0 2 2 2 2-2 2 3. Step Y1-1 3-1 -1 1 1 1 1-1 Y2-1 -1 3-1 1 1 1 1-1 Y3-1 -1-1 3 1 1 1-3 3 Y4-3 1 1 1 3 3 3-1 1 4. Step Y1-2 4 0 0 0 2 2 2-2 Y2-2 0 4 0 0 2 2 2-2 Y3-2 0 0 4 0 2 2-2 2 Y4-2 0 0 0 4 2 2-2 2 Fig. 7: Topology of neural network for processing patterns from training sets R1 and R2 (B=1). Fig. 8: Structure of neural network (from Fig. 7) after adaptation on sets R1 and R2. Connections with zero weight values were omitted. In case of R1, the values of the dotted and the dashed connections are identical (2), in case of R2 they are different (contrary).

3.2 Analysis of adaptation results: Looking at the closing rows of Table 5 and at the Fig. 8, it is possible to express values of output neurons activations after passing training set as follows (1) : = X j w jj X 5 w 5j X 6 w 6j B w bj (1) Substituting values of weights after adaptation of R1 into equation (1) we obtain the following relation (2): = X j.4 1.2 1. 2 1. 2 (2) Now, we can generalize equation (2) and after the n-th pass we get neuron activation, which is expressed according to the formula (3). = X j. 4n 2n 2n 1. 2n (3), which can be reduced to (4): =n X j. 4 6 (4) From equation (4) it is clear that the value for the set R1 can never be positive. It is because Xj takes either value 1 or 1, therefore can only have values 10n or 2n. As the value of X5= X6= 1 for all patterns from R1 set, the sum of their contributions to value of each output neuron for each pattern is equal to 4n and the network will never be able to successfully learn patterns from R1 set. Substituting values related to R2 set into equation (1) in the same way, as we did with R1, we get the following (5) : =n X j. 4 2 (5) The formula (5) shows that values of X5 and X6 help to deduce the correct class of pattern presented (they restrict the choice into two possible models). Their values in patterns 1 and 2 increase Y 1 and Y 2 of value 4 while reduce Y 3 Y 4 of value 4. Their values in patterns 3 a 4 do the opposite. The weights of connections related to the 5-th and 6-th input are exactly contrary. It implies, if the values of X5 and X6 are the same (the case of R1 set), their contribution to the activation value of each output in each pattern is zero. Therefore, network adapted to R2 correctly identifies patterns from R1 too. 4 Optimization of classifier As we have shown in the previous example, difficulties with training set R1 lies in components X5 and X6, which have the same value in all patterns. Therefore, these components do not help us to assign proper classes to patterns. We can describe these components as excessive (irrelevant). In addition, during the learning process, connections related to these components get nonzero values, which leads to confusion and network losses its learning ability. Based on our experimental study, we proposed a method of evaluating the relevance of the input vector components. Principles of the method are simple: 1. Before adaptation, algorithm walks through the training set and identify as irrelevant all the items, whose value in all patterns is the same. 2. Weights of connections related to the irrelevant items are ignored during the adaptation. 3. Thanks to that, such weights remain 0. Algorithm that marks irrelevant items can be written as follows: 1. Mark all items as irrelevant. 2. Load input vector of the first pattern and remember values of its items. 3. Repeat with all successive patterns: a. Load input vector. b. Mark every irrelevant item as relevant in case, that its actual value differs from that in the first pattern. 4. End.

This modified classifier is now possible to adapt to the both sets R1 and R2. Using this preprocessing, neural network becomes more specified to an actual training set, i.e it loss some of its generalization ability. Fig. 9 shows topology of the network, which uses the proposed algorithm to identification of irrelevant items, which are highlighted in gray. Related connections (dashed) are then ignored during the adaptation process. The Fig. 10 shows the structure of the neural network after adaptation to the R1. Connections with zero-weight values were excluded. Fig. 9: Network topology for R1 set after preprocessing. Items X5 and X6 are marked as irrelevant. The values of weights of related connections remain zero during the whole adaptation. Fig. 10: The structure of neural network after its adaptation of R1. Connections with zero weight values were excluded. Finally, both original data sets P1 (see Fig. 3, Table 1) and P2 (see Fig. 4, Table 2) were presented to the adjusted classifier. Looking at Fig. 11 and Fig. 12, we can see irrelevant items in both sets marked as gray. As expected, the classifier now can learn and correctly classify all training patterns of both sets P1 and P2. In the case, the adaptation of P2 set do not lead to correct classification of patterns of P1 but the network behavior is in line with expectations. Due to the elimination of redundant items from training sets, the network has lost some of its generalization ability. Fig. 11: Patterns from the set P1 showing irrelevant components (gray) Fig. 12 Patterns from the set P2 showing irrelevant components (gray) The training set P3 has been designed in the final step of our experimental study, which includes all patterns from sets P1 and P2. No irrelevant components were found in this united training set. Then, its adaptation process went correctly in accordance to expectations, where all patterns from P3 set ( e.g P3=P1 P2.) were correctly adapted. 5 Conclusion In this experimental study we have managed to explain the cause of the unexpected behavior of the neural network, which we have seen in previous time-series-related experiments [1]. We have designed, theoretically justified and experimentally tested a new method for preprocessing of a training set. This method enhances the ability of neural network to learn and classify patterns. References [1] Janošek M., Kocian V., Kotyrba M., Volná E., Pattern recognition and system adaptation In Kováčová, M. (ed.): Proceedings of the 10th International Conference on Applied Mathematics, Aplimat 2011, Bratislava, Slovakia, 2011, pp. 1217-1226 [2] Doidge, Norman, The Brain That Changes Itself,Viking Press, 2007 [3] Laurene V. Fausett, Fundamentals Of Neural Networks: Architectures, Algorithms And Applications, Prentice Hallm, 1994 [4] Leandro Nunes de Castro, Fundamentals of Natural Computing, Chapman & Hall, 2006 [5] Bishop, Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1997