Important properties of artificial neural networks will be discussed, namely that,

CP8206 Soft Computing & Machine Intelligence 1 PRINCIPLE OF ARTIFICIAL NEURAL NETWORKS Important properties of artificial neural networks will be discussed, namely that, (i) the underlying principle of artificial neural networks. (ii) general representation of the neural networks, (iii) the principles of the error correction algorithm.

CP8206 Soft Computing & Machine Intelligence 2 ARTIFICIAL INTELLIGENCE & NEURAL NETWORKS During the past twenty years, interest in applying the results of Artificial Intelligence (AI) research has been growing rapidly. AI relates to the development of theories & techniques required for a computational engine to efficiently perceive, think & act with intelligence in complex environments. The artificial intelligence discipline is concerned with intelligent computer systems, exhibiting the characteristics associated with intelligence in human behavior, such as understanding language, learning, solving problems & reasoning.

CP8206 Soft Computing & Machine Intelligence 3 BRANCHES OF AI Developments in some branches of AI have already led to new technologies having significant effects in problem solving approaches. These include new ways of defining the problems, new methods of representing the existing knowledge regarding the problems & new problem handling methods. There are several distinctive areas of research in Artificial Intelligence, more importantly: artificial neural networks, fuzzy logic systems, expert systems, each with its own specific interest, research techniques, terminology & objectives (Fig. 1).

CP8206 Soft Computing & Machine Intelligence 4 AI Neural Networks Expert Systems Genetic Algorithms Fuzzy Systems Neuro-Fuzzy Systems Neuro-Genetic Systems Fuzzy-Expert Systems Fig. 1: Partial Taxonomy of Artificial Intelligence depicting a number of important AI branches & their relationships

CP8206 Soft Computing & Machine Intelligence 5 NEURAL NETWORKS Among the various branches of AI, the area of artificial neural networks in particular has received considerable attention during the twenty years. An artificial neural network is a massively parallel & distributed processor that has a natural propensity for storing experimental knowledge & making it available for use. The underlying idea is to implement a processor that works in a fashion similar to the human brain.

CP8206 Soft Computing & Machine Intelligence 6 NEURAL NETWORKS NN resembles the brain in two respects; first, the knowledge is acquired through a learning process, & second, inter-neuron connection strengths known as weights are used to store the knowledge. The learning process involves modification of the connection weights to obtain a desired objective. Major applications of neural networks can be categorized into five groups including Pattern recognition, image processing, signal processing, system identification & control.

CP8206 Soft Computing & Machine Intelligence 7 NEURAL NETWORKS There are a variety of definitions for artificial neural networks each of which highlights some aspects of this methodology such as: its similarity to its biological counterpart, its parallel computation capabilities, & its interaction with outside world. A neural network is a non-programmable dynamic system with capabilities such as trainability & adaptivity that can be trained to store, process & retrieve information. It also possesses the ability to learn & to generalize based on past observations.

CP8206 Soft Computing & Machine Intelligence 8 NEURAL NETWORKS Owe their computing power to their parallel/distributed structure & the manner that the activation functions have been defined. This information processing ability provides the possibility of solving complex problems. Function approximation (I/O mapping): ability to approximate any nonlinear function to the desired degree. Learning & Generalization: ability to learn I/O patterns, extract the hidden relationship among presented data, & provide acceptable response to new data that the network has not yet experienced. This enables neural networks to provide models based on the imprecise information.

CP8206 Soft Computing & Machine Intelligence 9 NEURAL NETWORKS Adaptivity: capable of modifying their memory, & thus its f n ality, over time. Fault tolerance: due to their highly parallel/distributed structure, failure of a number of neurons to generate the correct response does not lead to failure of the overall performance of the system.

CP8206 Soft Computing & Machine Intelligence 10 NEURAL NETWORKS - DISADVANTAGES large dimension that leads to memory restriction; selection of optimum configuration; convergence difficulty especially when sol n is trapped in local minima; choice of training methodology; black-box representation, lack of explanation capabilities & transparency.

CP8206 Soft Computing & Machine Intelligence 11 NEURAL NETWORKS A neural network can be characterized in terms of: Neurons: the basic processing units defining the manner in which computation is performed. Neuron activation functions: indicate the function of each neuron. Inter-neuron patterns: define the way neurons are connected to each other. Learning algorithms: define how the knowledge stored in the network.

CP8206 Soft Computing & Machine Intelligence 12 NEURON MODEL NN paradigm attempts to clone the physical structure & functionality of the biological neuron. Artificial neurons, like their biological counterparts, receive inputs, [x 1, x 2,..., x r ], from the outside or other neurons through incoming connections. Each neuron then generates a product term, [w i x i ], using the inputs & connections weights ([w 1, w 2,..., w r ], represents the connection memory). The product terms are then summed using an addition operator to produce the neuron internal activity index, v(t).

CP8206 Soft Computing & Machine Intelligence 13 NEURON MODEL This index is passed to an activation function, ϕ(.), which produces an output, y(t). vt ()= r i= 1 wx i i ( vt) (1) yt () = ϕ () (2) A more general model of the neuron functionality can be provided by the introduction of a threshold measure, w 0, for the activation function.

CP8206 Soft Computing & Machine Intelligence 14 NEURON MODEL This signifies the scenario where a neuron generates an output if its input is beyond the threshold (Fig. 2), i.e., r yt () = ϕ ( wx i i w0 ) (3) i = 1 This model is a simple yet useful approximation of the biological neuron & can be used to develop different neural structures including feedforward & feedback networks (Fig. 3).

CP8206 Soft Computing & Machine Intelligence 15 x 1 ±1 w k1 w k1 x 2 w k1 a k ϕ(.) y k x r w k1 aggregation operation synaptic operation somatic operation Fig. 2: nonlinear model of a neuron

CP8206 Soft Computing & Machine Intelligence 16 TYPES OF ACTIVATION FUNCTIONS Each neuron includes a nonlinear function, known as the activation function, that transforms several weighted input signals into a single numerical output signal. The neuron activation function, ϕ(.) expresses the functionality of the neuron. There are at least three main classes of activation function, including linear, sigmoid & Gaussian. Table 3.1 illustrates different types of activation functions.

CP8206 Soft Computing & Machine Intelligence 17 NEURAL NETWORK ARCHITECTURES The manner in which neurons are connected together defines the architecture of a neural network. These architectures can be classified into two main groups (Fig. 3): Feedforward neural network Recurrent neural network

CP8206 Soft Computing & Machine Intelligence 18 Neural Networks Feedforward Lattice Recurrent Single Layer Multi Layer Single Layer Multi Layer Perceptron Radial Basis Function Elman Hopfield Fig. 3: Classification of different neural network structures

CP8206 Soft Computing & Machine Intelligence 19 FEEDFORWARD NEURAL NETWORK The flow of the information is from input to output. SINGLE LAYER NETWORK (Fig. 4): The main body of the structure consists of only one layer (a one-dimensional vector) of neurons. Can be considered as a linear association network that relate the output patterns to input patterns.

CP8206 Soft Computing & Machine Intelligence 20 x 1 ϕ(.) y 1 x 2 ϕ(.) y 2 x r Inputs ϕ(.) Single Layer of Neurons y r Outputs Fig. 4: Single layer feedforward neural network

CP8206 Soft Computing & Machine Intelligence 21 A MULTI-LAYER NETWORK (Fig. 5): The structure consists of two or more layers of neurons. The function of the additional layers is to extract higher order statistics. The network acquires a global perspective despite its local connectivity by virtue of the extra set of connection connections & the extra dimension of neural interaction. Specified by The number of I/O, the number of layers, Number of neuron in each layer, The network connection pattern, & The activation function for each layer.

CP8206 Soft Computing & Machine Intelligence 22 x 1 x 2 ϕ 1 (.) x 3 ϕ 1 (.) ϕ 2 (.) y 1 ϕ 2 (.) y q x p-1 ϕ 1 (.) Outputs x p Second Layer Inputs First Layer Fig. 5: Multi-Layer feedforward neural network

CP8206 Soft Computing & Machine Intelligence 23 RECURRENT NEURAL NETWORK A recurrent structure represents a network in which there is at least one feedback connection. Fig. 6 depicts a multi-layer recurrent neural network, which is similar to the feedforward case except for the presence of the feedback loops & z -1 (unit delay operator) that introduces the delay involved in feeding back the output to input.

CP8206 Soft Computing & Machine Intelligence 24 Feedback connections z -1 ϕ 1 (.) z -1 ϕ 1 (.) ϕ 2 (.) y 1 Unit delay x 1 ϕ 2 (.) y q x p-1 ϕ 1 (.) Outputs x p Second Layer Inputs First Layer Fig. 6: Multi-layer recurrent neural network

CP8206 Soft Computing & Machine Intelligence 25 Table 3.1: Neural Network Activation Functions Piecewise Linear; Function 1 if x < b f( x) = ax. if x< b -10 + 1 if X > b 1 0.8 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8 plot -1-5 0 5 10 10 8 Linear; f(x)=a.x 6 4 2 0-2 -4-6 -8-10 -10-5 0 5 10 1 0.8 Indicator; f(x)=sgn(x) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 -10-5 0 5 10

CP8206 Soft Computing & Machine Intelligence 26 1 Sigmoid; f ( x) = +. 1 e ax 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-10 -5 0 5 10 1 0.8 Bipolar Sigmoid; f(x)=tanh(a.x) 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 -10-5 0 5 10 1 0.9 0.8 0.7 Gaussian; f ( x) = e x 2 2σ 2 0.6 0.5 0.4 0.3 0.2 0.1 0-10 -5 0 5 10

CP8206 Soft Computing & Machine Intelligence 27 MULTI-LAYER PERCEPTRON (MLP) A class of NNs that consists of one input layer together with one output layer that represent the system inputs & outputs, respectively, & one or more hidden layers that provide the learning capability for the network (Fig. 7). The basic element of a MLP network is an artificial neuron whose activation function, for the hidden layer, is a smooth, differentiable function (usually sigmoid). The neurons in the output layer have a linear activation function.

CP8206 Soft Computing & Machine Intelligence 28 w ij, b ij : weights & biases-hidden layer i: number of inputs; 1,,n j: number of neurons; 1,,m ω jk : weights-output layer k: number of outputs LINEAR m n f( x1,..., xn) = ωi g wij xj θ i i= 1 j= 1 Sigmoid function 1 gx ( )= + 1 e x ω 1 ω m g(x) g(x) g(x) g(x) g(x) w 1,1 w n,m X 1 X 2 X 3 Fig. 7: General structure of a Multi-Layer Perceptron network, illustrating the concept of input, hidden & output layers X n

CP8206 Soft Computing & Machine Intelligence 29 MLP The output of a MLP network, therefore, can be represented as follows: F ( x 1,..., x p ) M p = ω i g w ij x j θ i 4243 i = 1 j = 1 1 442443 1 44 4 244 44 3 internal activation hidden layer output output layer output (4) where F( ) is the network output, [x 1,,x p ] is the input vector having P inputs, M denotes the number of hidden neurons, w represents the hidden layer connection weights, θ is the threshold value associated with hidden neurons, & ω represents the output layer connection weights which in effect serves as coefficients to the linear output function.

CP8206 Soft Computing & Machine Intelligence 30 UNIVERSAL APPROXIMITY It has been proven mathematically that standard multi-layer perceptron networks using arbitrary squashing functions are capable of approximating any continuous function from one finite dimensional space to another to any desired degree of efficiency, provided sufficient hidden neurons are available. A squashing function is a non-decreasing function that is defined as follows: σ( t) 1 0 as as t t +,. (6)

CP8206 Soft Computing & Machine Intelligence 31 UNIVERSAL APPROXIMITY It has been further shown that approximation can be achieved using any multilayer perceptron with only one hidden layer & sigmoid function. MLPs are a class of universal approximator & can be used successfully to solve difficult problems in diverse areas using the error back-propagation learning algorithm. Furthermore, failure in learning can be attributed to factors such as inadequate learning, insufficient number of hidden neurons, & non-deterministic nature of relationship between inputs & outputs.

CP8206 Soft Computing & Machine Intelligence 32 THE STONE-WEIERSTRASS THEOREM to prove that NNs are capable of uniformly approximating any real continuous function on a compact set to an arbitrary degree of accuracy. This theorem states that for any given real continuous function, f, on a compact set U R n, there exists an NN, F, that is an approximate realization of the function f( ): M p F( x1,..., xp) = ωi ϕ wijxj θ i (7) i = 1 j = 1 F( x1,..., xp) f ( x1,..., xp) < ε (8) where X=(x 1, x 2,, x n ) U represents the input space, ε denotes the approximation error for all { x1,..., x p } U & ε is positive very small value.

CP8206 Soft Computing & Machine Intelligence 33 LEARNING PROCESS accomplished through the associations between different I/O patterns. Regularities & irregularities in the training data are extracted, & consequently are validated using validation data. Achieved by stimulating the network using the data representing the f n learned & attempting to optimize a related performance measure. to be Assumed that the data represents a system that is deterministic in nature with unknown probability distributions.

CP8206 Soft Computing & Machine Intelligence 34 LEARNING PROCESS The fashion in which the parameters are adjusted determines the type of learning. There are two general learning paradigms (Fig. 8): Unsupervised learning Supervised learning Unsupervised learning not in the scope of the course, not to be discussed.

CP8206 Soft Computing & Machine Intelligence 35 Learning Algorithms Supervised Learning Unsupervised Learning Back Propagation Widrow- Hoff Rule Perceptron rule Associative Self Organizing Kohonen Hebbian Competitive Fig. 8: A classification of learning algorithms

CP8206 Soft Computing & Machine Intelligence 36 SUPERVISED LEARNING The organization & training of a neural network by a combination of repeated presentation of input patterns & their associated output patterns. Equivalent to adjusting the network weights. In supervised learning, a set of training data is used to help the network in arriving at appropriate connection weights. Can be seen in the conventional delta rule, one of the early supervised algorithms, that was developed by McCulloch & Pitts, & Rosenblatt. In this method, a training data set is always available that provides the system ideal values for output due to a set of known inputs & the goal is to obtain the strength of each connection in the network.

CP8206 Soft Computing & Machine Intelligence 37 BACK-PROPAGATION The best known supervised learning algorithm. This learning rule was first developed by Werbos, improved by Rumelhart et al. The learning is done on the basis of direct comparison of the output of the network with known correct answers. An efficient method of computing the change in each connection weight in a multi-layer network so as to reduce the error in the outputs. Works by propagating errors backwards from the output layer to the input layer.

CP8206 Soft Computing & Machine Intelligence 38 Back-Propagation an efficient method of computing the change in each connection weight in a multi-layer network so as to reduce the error in the outputs. The method essentially works by propagating errors backwards from the output layer to the input layer. assuming that w ji denotes the connection weight from i th neuron to j th, x j signifies the input to j th neuron, y j represents the corresponding output, d j is the desired output: Total input to unit j: x j = yiw ji (9) i Output from unit j: y j 1 = 1 + e x j (10)

CP8206 Soft Computing & Machine Intelligence 39 The back-propagation algorithm attempts to minimize the global error which, for a given set of weights, is the squared difference between the actual and desired outputs of a unit, i.e., where E denotes the global error. 1 ( y ) 2 j d E =, c j, c (11) 2 c j The error derivatives for all weights can be computed by working backwards from the output units after a case has been presented and given the derivatives, the weights are updated to reduce the error.

CP8206 Soft Computing & Machine Intelligence 40 j+2 j+1 j E y j E x j = y d j j E y y y y = = j x j j ( 1 ); y ( 1 y j j j) j i+2 i+1 i E w ji E y i E x y xj = w y i ; = j E x w x j = w ji ; = j y j j i i ji Fig. 9: basic idea of back-propagation learning algorithm

CP8206 Soft Computing & Machine Intelligence 41 BACK-PROPAGATION Consists of two passes; forward and backward. Forward pass: a training case is presented to the network. The training case itself consists of an input vector and its associated (desired) output. Backward pass: starts when the output error, i.e., the difference between the desired and actual output, is propagated back through and changes are made to connection weights in order to reduce the output error. Different training cases are then presented to the network. The process of presenting epochs of training cases to the network continues until the average error over the entire training set reaches a defined error goal.

CP8206 Soft Computing & Machine Intelligence 42 Define network structure Define connection pattern Define activation functions Define p erformance Prepare training data Prepare validation data P ro vid e stim ilus from training set to the network feedforward flow of information - generate output and performance measure P ro vid e stim ilu s from validation to the netw ork error backpropagated through the netw ork, changes proportional to the derivative of error wrt weight tyo be made to synaptic weights No perform ance measure satisfactory? Yes feedforw ard flow of information - generate output and performance measure No perform ance measure satisfactory? end of training Yes Fig. 10: Basic presentation of back-propagation learning algorithm