A Neural Network Model For Concept Formation

A Neural Network Model For Concept Formation Jiawei Chen, Yan Liu, Qinghua Chen, Jiaxin Cui Department of Systems Science School of Management Beijing Normal University Beijing 100875, P.R.China. chenjiawei@bnu.edu.cn Fukang Fang State Key Laboratory of Cognitive Neuroscience and Learning Beijing Normal University Beijing 100875, P.R.China. fkfang@bnu.edu.cn Abstract Acquisition of abstract concept is the key step in human intelligence development, but the neural mechanism of concept formation is not clear yet. Researches on complexity and self organization theory indicate that concept is a result of emergence of neural system and it should be represented by an attractor. Associative learning and hypothesis elimination are considered as the mechanisms of concept formation, and we think that Hebbian learning rule can be used to describe the two operations. In this paper, a neural network is constructed based on Hopfield model, and the weights are updated according to Hebbian rule. The forming processes of natural concept, number concept and addition concept are simulated by using our model. Facts from real neuroanatomy provide some evidences for our model. 1 Introduction It is an important and difficult problem that how concept is abstracted from concrete instances with some common features. In the last 100 years, issues about concept formation and concept development are discussed by psychologist using the prevalent means of behavioral experiments [10], but the neural mechanism of these processes are not clear. Recently, neural network models [1, 2] are used to discuss the problems about concept formation from the angel of word learning. This method first needs to explain that how a concept is represented in the neural system, i.e. how a cognition state, such as concept or memory, is represented by a physical system which consists of neurons and connections between them. Hopfield successfully discussed the problem using the thoughts on emergence, and his model has been applicated frequently in the field of cognition. Hopfield network [7] indicated that an attractor which dominates substantial region around it in the phase space denotes a nominally assigned memory. The same as memory, concept is also cognition state and should be represented by attractor of the physical system. Obviously, the collective behaviors are more appropriate than individual unit to express cognition states because of more robustness and stability. Some attractor networks [6, 4] have been created to study questions about language learning. Concrete instance can be represented by many features, and maybe language is the best way to describe these features. Word meanings carve up the world in complex ways, such that an entity, action, property, or relation can typically be labeled by multiple words [13]. Language can be seen as a simple and complete projection of the real world. The early stages of word learning are often used to study the issues of concept formation in many literatures [13, 12]. Although these works mainly focus on the acquisition of concrete concepts, abstract concepts should can be discussed by using language as research object. Some Chinese characters are pictograph, and the grapheme can express more signification than alphabet language systems. So from the angle of modeling, Chinese characters would be more suitable than other languages for exploring the neural mechanisms of concept formation. In this article, the process of how features are extracted from samples is simulated and the neural mechanisms would be discussed. A model based on Hopfield network is constructed, and the connection weights are updated based on a variant of Hebb learning rule. Samples with some common features, certainly each of them has some special features, are used to train the model. The weights states and test results indicate that the common features of the samples can be extracted by the model and represented by an attractor of the system. In the first section of this paper, a model based on Hopfield network is constructed, including architecture, weight update algorithm, samples, etc. The second section, three groups of samples are used to train the model and the simulated results are illuminated. At last, the neural mechanism of concept formation is summed up and explained.

2 Model Let us consider a fully connected recurrent neural network, which is a variant of Hopfield model. The details of the network, such as the Architecture, weight adjustment, samples and training, etc. are described below. 2.1 Architecture Of The Network Our neural network just has one layer and is composed of N neurons. Each neuron i has two states V i = 0 or V i = 1, which denote not firing or firing at maximum rate respectively. The instantaneous state of the system is specified by listing the N values of V i, so it is represented by a binary word of N bits. The network is full connected, ie. all neurons connected with each others. The strength of the connection from neuron j to neuron i is defined as w ij. We suppose that 0 w ij 1, i, j and w ii = 0, i. How the system deals with information is decided by the current weights state. Because there is only one layer in our model, each neuron plays the roles of receiving input vector form environment and denoting the output result. Input vector X i should be represented by a binary word of N bits so as to be matched with the neuron state. For example, the ith input vector can be written as X i = [x i,1, x i,2,..., x i,n ], x i,j = 0 or 1. The output of our network can be represented by the states of all the N neurons. 2.2 Weight Update Algorithm All the neuron states should be determined before the weights are updated. Two cases will be considered for calculating the neuron state. On the one hand, when a sample is input to the network the neurons states are changed to the same as the input vector, i.e. V i = x c,i, here x c,i denotes the ith component of the current input vector; On the other hand, when no external instance were provided, the neuron state changes with time according to the following algorithm. N V i = hardlim( w ij V j θ i ) (1) j=1 where θ i denotes the threshold of the ith neuron. In a certain system, we assume that all the thresholds are equal to a constant θ. In the formal theory of neural networks the weight w ij is considered as a parameter that can be adjusted so as to optimize the performance of a network for a given task. In our model, we assume that the weight will be updated according to Hebbian learning rule [5], i.e. the network learns by strengthening connection weights between neurons activated at the same time. It can be written as following: η w ij d, if V i = 1, V j = 1 η w ij d, if V i = 1, V j = 0 w ij = η w ij d, if V i = 0, V j = 1 d, if V i = 0, V j = 0 here, 0 < η < 1 is a small constant called learning rate. The parameter d is a small positive constant that describes the rate by which w ij decays back to zero in the absence of stimulation. Of course, equation (2) is just one of the possible forms to specify rules for the growth and decay of the weights, and there are some difference with the other forms of Hebb rule [3]. From the formula (2) we can see that synaptic efficacy w ij would grow without limit if the same potentiating stimulus is applied over and over again. A saturation of the weights should be consider. On the other hand, the synaptic efficacy w ij should be non-negative. These two restrictions can be achieved by setting: 1, if w ij (t + 1) > 1 w ij (t + 1) = w ij (t + 1), if 0 w ij (t + 1) 1 (3) 0, if w ij (t + 1) < 0 2.3 Samples For Training In our model, the features of sample are represented by the dot matrix of Chinese character. Each element of the matrix denotes one feature. The value of each element is 1 or 0, which indicates that the sample has or hasn t the feature corresponding to the element. Each neuron in the network has only two states, any input vector should be represented by a binary word so as to be matched with the neuron state. Combination of a few Chinese characters is chosen as sample of our model, such as,, etc. Each sample is presented using m 16 16 dot matrix, where m denotes the number of Chinese character. The matrix element in the character s stroke is set to 1, otherwise should be set to 0. At last the input vector can be obtained by converting the dot matrix into a vector, example of an instance is shown in Figure 1. It is need to point out that different training set should be used for different purpose of experiment. Each sample of a training set has the same number of characters, but samples belong to different sets perhaps has different number of characters. The number of neurons in our network is decided by the number of characters in each sample. For examples, a network using as a sample would consist of 2 16 16 = 512 neurons. (2)

is the essential feature that all horses have altogether and the color is the special feature which each sample has it s own solely. The model simulates the process in which the abstract concept horse is extracted from several samples of horse with different color by drawing out the common features and eliminating the unique characteristic. The concept should correspond to an attractor of our model. Figure 1. The representation of an instance (A)The instance includes 2 characters; (B)The instance represented by dot matrix; (C)The input vector obtained by putting the dot matrix into one column. 2.4 Training And Testing Training is a procedure of weights updated iteratively according to the external input. In a certain experiment, the training set for the network consisted of k samples with some identical properties. During each epoch, an instance randomly selected from the training set (k samples) is shown to the network. The neuron states are changed to be the same as the input vector, and the weights are adopted according with the formulae 2 and 3. The network should be trained again and again until the weight matrix is changed over a small range at last. After learning a training set in which the instances have some identical features, does the network know these features? We addressed this question by presenting the network with some input patterns and examining the output patterns of network. If the network has learned these features, then the network will evolve to a stable state witch denotes the concept when a sample with all or most of the identical features is shown to the network. In the testing procedure, all of the weights will be fixed and a test sample would be shown to the network. The output of the network can be calculated according to formulae 1. 3 Simulation Results Firstly, how the concept horse is extracted from samples is simulated. For a horse sample, we consider the following two features: the shape and the color. The shape Figure 2. The training set includes six instances. Six samples shown in figure 2 are used in the model, i.e. k = 6. The neuron number N = 512 can be determined by calculating the dots of any sample. Before training, we initialize the weights randomly from 0.2 to 1, i.e. { 0.2 < Wij (0) = W ji (0) < 1, i, j and i j; (4) W ii (0) = 0, i. the wights are shown in Figure 3(A). The other parameters are set to be η = 0.25, d = 0.05. The network is trained 150 times with samples selected from the training set randomly and the weights are shown in Figure 3(B). Figure 3. The weight matrix evolve from a random initial state to a stable final state. (A)The initial state; (B)The final state. By comparing the trained weights with the initial, we can obtain the following results: 1. The weight matrix changes from random distribution to a stable state with the

training process, and no any obvious change will happen once the weights reach the stable state. 2. From figure 3, we can see that the number of connections between neurons is massively reduced, but the average connection strength changes from w ij = 0.600 to w ij = 0.999 in the process of training. The more connections indicate the more plasticity of the network, and the strong and stable connections perhaps denote certain cognition patterns. 3. Because the similarity between individual character of the samples, some elements in the top-left quarter of the weight matrix are not 0. All elements except the bottom-right quarter will change to 0 by increasing the number of samples in the training set. The stable state of the weights is a attractor of the system, and the fix point correspond to a cognition pattern of concept horse extracted from samples. In theoretical aspect, the analysis of the collective behaviors of the neurons may refer to hopfield s work. We can also exam the attractor and cognition state of the system directly using three test samples which displayed in Figure 4(A), (B), (C). Here, we set the parameter θ = 30. The network weights are fixed and the samples are input to the network respectively. The output of the network can be calculated by formulae 1 and the test results are also shown in Figure 4. In the phase space, the three samples is dominated by a attractor which is the nominally assigned concept horse and they will eventually settle into the attractor state. On the other hand, sample which is not in the domain of the attractor will not evolving to the stable state. An example is shown in Figure 4(D). Figure 4. The attractor of the network is tested though three positive examples and a negative instance. (A) A sample arbitrarily selected from the training set; (B) An incomplete sample includes a majority of, but not all, features of the concept horse; (C) A sample includes all features of horse, and individual features are given arbitrarily; (D) A sample includes only little feature of the concept horse, although its individual features are used in the training process. Our model simulates the forming process of natural concept by using the horse as an example. In fact, a class of concept which formed by extracting the common features from concrete instances can be simulated using our model, such as the concept of natural number and addition. The simulation result is shown in Figure 5. Figure 5. Two other examples of concept formation. (A) The number concept 3 is extracted from the six samples; (B)The addition concept 2+3=5 is extracted from the six samples. Certainly, the concept of number consists of many connotations include concrete concept, abstract concept, ordering concept and number structure [9]. Our model only simulates the emerge process from concrete concept to abstract concept. 4 Discussion As we mentioned above, the issues of concept formation have been discussed from the angel of word learning, and two broad classes of proposals for how word learning works have been dominant in the literature: hypothesis elimination and associative learning. we consider that the union of the two operations may be the mechanism of concept formation. On the one hand, The features that all samples have are called essential features, and the connections between neurons which represent the essential features will be strengthened during the training process. Associative learning works. On the other hand, the connections between individual features each other and between individual features and essential features become weaker and weaker gradually. Hypothesis elimination works. These two operations can be precisely described by Hebbian learning rule. So Hebbian learning rule should be the neural mechanism of concept formation in certain condition. The reliability of our model can be explained by the fact from neuroanatomy. Experiments indicate that the number of connections between neurons is massively reduced in the adult compared to the infant. In the cat, for example, there is a huge decrease in the number of callosal axons during neonatal life, and a 90% reduction in the number of synapses and branches of the axonal arbors of the remaining fibres [8, 11]. The fact is similar with out simulation results. However, the process of concept formation is very complicated and the essential of the process is emergence.

Hebbian learning rule perhaps can curve up the forming mechanism of some simple concept. For complex scientific concept and social concept, more kinds of factors and more complex mechanisms should be considered. Acknowledgement This work is supported by NSFC under the grant No.60534080, No.60374010 and No.70471080. References [1] E. Colunga, L. B. Smith, From the Lexicon to Expectations About Kinds: A Role for Associative Learning, Psychological Review 2 (2005) 347-382. [2] M. Gasser, L. B. Smith, Learning nouns and adjectives: A connectionist approach. Language and Cognitive Processes 13 (1998) 269-306. [3] W. Gerstner, W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity, Cambridge University Press, Cambridge, 2002. [4] M. W. Harm, M. S. Seidenberg, Phonology, Reading Acquisition, and Dyslexia: Insights from Connectionist Models, Psychological Review 106 (1999) 491-528. [5] D. O. Hebb, The Organization of Behavior, Wiley, New York, 1949. [6] G. E. Hinton, T. Shallice, Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review 98 (1991) 74-95. [7] J. J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proc. NatL Acad. Sci. USA 79 (1982) 2554-2558. [8] G. M. Innocenti, Exuberant development of connections, and its possible permissive role in cortical evolution, Trends Neurosci. 18 (1995) 397-402. [9] C. Lin, The study on the development of the number concept and operational ability in schoolchildren, Acta Psychologica Sinica 3 (1981) 289-298. [10] E. Machery, 100 years of psychology of concepts: the theoretical notion of concept and its operationalization, Studies in History and Philosophy of Biological and Biomedical Sciences 38 (2007) 63 84. [11] B. Payne, H. Pearson, P. Cornwell, Deveopment of visual and auditory cortical connections in cat. Cerebral Cortex 7 (1988) 309-389. [12] T. Regier, The Emergence of Words: Attentional Learning in Form and Meaning, Cognitive Science 29 (2005) 819 865. [13] F. Xu, J. B. Tenenbaum, Word Learning as Bayesian Inference, Psychological Review 2 (2007) 245-272.