CHAPTER 1 AN OVERVIEW OF VLSI IMPLEMENTATION OF ANN

1 CHAPTER 1 AN OVERVIEW OF VLSI IMPLEMENTATION OF ANN 1.1 INTRODUCTION A Biological Neural Network (BNN) has highly interconnected neurons that co-ordinate all the functions like breathing, thinking and reading. A partly developed structure of network at birth, is fully developed through experience and learning. Hence learning is essentially a process of establishing new connections or modifying the existing connections. Artificial Neurons are simple abstractions of biological neurons, programmed in software or modeled in hardware. The networks of artificial neurons known as Artificial Neural Networks (ANN) have a fraction of power of biological neural structures. They are often trained to solve complex functions (Hagen et al. 2002). The background work for the development of ANN had started in early 20 th century. It had started with the theories of learning and vision as an interdisciplinary work involving physics, psychology and neurophysiology. McCulloch and Pitts (1943) proposed the mathematical model in 1943. They proved that an ANN could compute both arithmetic and logical functions. Their contribution is acknowledged as the origin of ANN research. Hebb (1949) then proposed a mechanism for learning in biological neurons at the cellular level in 1949. Rosenblatt (1958) proposed the Perceptron model to solve the first practical application pattern recognition in 1958. It induced some interest in ANN research, but the model itself proved to suit only a limited class of problems. Around the same time, Widrow and Hoff

2 (1960) developed a new learning algorithm to train the ANN; the same is still in use. Minsky and Papert (1969) brought out that these networks are suitable only to solve linear separable problems. This fact and limited computer capabilities were the main deterrents that put a break on ANN research atleast for a decade. Development of memory architectures using ANN independently by Kohonen and Anderson (1972) and modeling of Self- Organizing Maps (SOM) by Grossberg (1976) were significant developments in the 70 s. Research in ANN has taken a rebirth during the 1980s, due to the availability of powerful computers and the development of two new concepts. The first was an associative memory using Recurrent Neural Network (RNN) in 1982 by Hopfield (1982). The second was the Multi-Layer Perceptron (MLP) with Back Propagation (BP) algorithm in 1986 by Rumelhart and McClelland (1986). With these developments, research in the field of ANN got reinvigorated with the exploration of new algorithms, and their implementations. Significant developments or attempts at applications had essentially been sporadic. ANNs have been explored in different fields like Aerospace, Banking, Defense, Electronics, Financial, Medical, Robotics, Speech Processing and Telecommunications (Hagen et al. 2002). Wide variety of applications of these fields ranging from language processing, control function, image processing, system modeling and prediction, to pattern recognition and classification (Maeda and Tada 2003) have encouraged its future scope. 1.2 BIOLOGICAL NEURAL NETWORK (BNN) The characteristics of brain function have inspired the development of ANN. Approximately 10 11 neurons with ~ 10 4 connections per neuron form the biological neural network (Hagen et al. 2002). The neurons have the dendrites, the cell body (soma) and the axon' as shown in Figure 1.1.

3 Dendrites are nerve fibres that carry electrical signals into the cell body, where it is summed up and the threshold decides the output. The axon is a long fibre that conducts the signals to other neurons. A point that connects the axon of one neuron to the dendrite of another neuron is synapse. The function of the NN is determined by a complex chemical process, which in turn is influenced by the architecture and the synaptic strength. Only a portion of the neural structure is developed at birth and continuously grows through experience. Though the biological neurons are very slow (with the response time of the order of 10-3 s), the brain computes faster than a computer due to the massive parallel structure of the former. The biological and artificial neural networks are similar in the following context: They are highly interconnected. Their way of connection determines the function. They have massive parallelism that solves complex functions. (nerve fibre) (cell body) (long fibre) Figure 1.1 A Biological Neuron

4 It is worth stressing here that, the motivation for realization of ANN in hardware is primarily from the last of the above, namely massive parallelism and the need for solutions for complex problems in hardware with Very Large Scale Integration (VLSI) technology as well as using optical devices. The focus of the present work is on VLSI implementation of ANN. 1.3 ARTIFICIAL NEURAL NETWORKS Artificial Neural Network (ANN) is a model of highly interconnected networks. It comprises simple processing elements known as neurons. It results in a high degree of parallel computation (Graf et al. 1988). In-built potential of ANN enables it to solve problems of high complexity. Realization of ANN had essentially been in software. It can be executed with the help of a program in MATLAB or using the dedicated software like Neurodimensions, Neurosolutions, NeuNet Pro, Neuralware etc., to cite a few. ANN is proved to be a universal approximator (Brown and Harris 1994, Hornick et al. 1985) with the ability to learn. Learning is a recursive process involving multiple iterations which takes a long time, when implemented in software. However availability of powerful hardware at affordable cost has opened up the interesting possibility of realizing ANN in hardware. The focus of the present work is on VLSI hardware realization of ANN that facilitates a portable, adaptive and re-configurable system development. A model of an artificial neuron is shown in Figure 1.2. It comprises synapse, summer and an activation (threshold) function block. An individual neuron does very little by way of computation; rather it carries out thresholding of a combined input (Graf et al. 1988). A synapse is modeled as a multiplier that multiplies each input with its stored weight synaptic strength. The computation done by the whole network depends on the interconnections among the neurons. In Figure 1.2, x 1, x 2, x n are n inputs and w 1, w 2,..w n are respective weights of interconnections or synapses. The

5 thresholding and firing are the key output activities of a neuron. The former is governed by a function on a, which computes sum after multiplication where a, b i represent the net and bias respectively. x 1 Synapse x 2 w 1 w 2 x 3 w 3. Σ a f(a) Output y.. w n x n bias b 1 Summer Threshold Function Figure 1.2 Model of an Artificial Neuron n (1.1) i i i i=1 a = X W +b The net output y of the neuron is a function of its activation, y f (a) hence (1.2) n i i i (1.3) i 1 y f ( X W b ) The function in equation 1.3 involves multiplication and addition, resulting in a large circuit even for a moderate number of neurons. ANN can be classified based on the architecture and learning algorithms, which is elaborated in Appendix 1. Specific types of learning algorithms are briefly explained. 1.3.1 Types of Learning Algorithms Learning is a class of adaptation that adjusts the response of a network for unseen but similar stimuli through experience. It is an optimization problem. Here the weight space of a network can be searched,

6 when the topology and transfer functions are fixed. The training algorithms fall into three categories (Cauwenberghs 1997) namely: Supervised learning Unsupervised learning and Reinforced learning. One resorts to the choice amongst these, depending on the application. The work here is restricted to applications with known targets. Supervised learning being the best suited for such cases, is in focus in this work. Two types of supervised learning namely Back Propagation (BP) (Hikawa 2003b) and Simultaneous Perturbation (SP) (Maeda and Tada 2003) are utilized to learn weights here. (i) Back Propagation (BP) Algorithm Back Propagation (BP) learning has distinctive characteristics of nonlinearity and high degree of connectivity between different layers of network. It is determined by weights through the learning process. It consists of two passes through the different layers of the network; a forward pass and a backward pass. In the forward operation of learning, data from input layer is propagated to output layer through hidden layer neuron to compute the output. Then the algorithm executes the backward operation, which aims at minimizing the mean-square-error (mse) between the target and the output value. Computed error helps to update the weight values in the backward pass. Due to its efficiency, popularity and familiarity (Jabri and Flower 1992) the BP algorithm has been used extensively in the different phases of work presented.

7 (ii) Simultaneous Perturbation (SP) Algorithm BP algorithm assumes that the neurons have sigmoidal nonlinearity and the synapses are linear multipliers. These assumptions lead to larger synapses, imposing a constraint on silicon area. Moreover, hardware realization of BP algorithm is difficult due to the complexity in calculating the derivative of the sigmoid function (Maeda and Tada 2003). It also increases the interconnection complexity, since the modifying quantities are to be applied to all weights. A suitable alternative is to use Simultaneous Perturbation (SP) Learning, which is a simple and efficient algorithm. An input vector, a teach signal (target) and a set of random initial weights are presented to the network. The generated output is compared with target and the error is computed. Based on the error, the weight is modified and the output is computed again. Comparison of the result and the teach signal, gives perturbed error. Now the difference unit is activated to find the difference in error. From this, the weight modifying quantity is generated. It is then sent to all the weight units of the network. The error function helps to measure the gradient, which is a finite difference without involving complex calculations. Moreover, it makes no assumptions about the synaptic characteristics (Montalvo et al. 1997) and hence a compact synapse could be designed. It conserves the silicon area appreciably. Absence of error back-propagation circuit further minimizes the hardware. 1.3.2 Methods of Learning in Hardware Three different methods of learning have been reported in the literature (Reyneri 2003); Off-chip learning, On-chip learning and Chip-inthe-loop learning: Off-chip learning: The network is trained on an external computer, and weights are loaded on to the network. It is not

8 suited for analog systems, since analog storage is neither repeatable nor volatile. On-chip learning: The algorithm is implemented on the same chip itself. It consumes large space and preferred when adaptive and real-time systems are to be designed. Accuracy of the output depends on the resolution of trained weights represented as number of bits. Chip-in-the-loop learning: For situations wherein one has to tackle different problems, a two-pronged approach is to be followed. Firstly, the network is to be trained in a computer system, to elicit the parameters. Secondly, the weights so obtained, there-from are (to be utilized by ANN) to be download into the chip. Chip-in-the-loop learning is well suited for such a method. The applications considered in the present work being fixed, an elaborate procedure involving this twopronged approach is not called for and hence the method is not pursued further. 1.4 ANN IN VLSI HARDWARE The real promise of applications of the ANN model lies in the specialized micro-electronic hardware, to utilize the parallelism. In this respect, VLSI implementation of ANN is beneficial, to fully utilize the massive parallelism found in biological Neural Network (Mead 1989). Advances in VLSI have enabled realization of large and complex NN on a single chip (Vittoz 1990) if the components of the neuron are modeled precisely. Both computation time and power consumption can be minimized, when a compact design is ensured. A substantial part of the work is based on system design with off-chip learning. However, portable devices are very

9 demanding, in the sense that they need independence and minimal power. These should not be at the expense of speed. Hence a conscious attempt has been made to design a portable and independent system with on-chip learning. Direct fabrication of ANN in VLSI hardware is henceforth known as Neurohardware. The faster nature of the hardware that works in parallel increases the speed of computation of the Neurohardware (Marchesi et al. 1993, Palmes et al. 2005). It also exhibits improved fault tolerance to suit real-time applications (El-Marsy et al. 1997). The approach for hardware implementation of NN can be digital, analog, mixed-signal or pulse-stream based design (Murray et al. 1991). Out of these, analog designs being nonprogrammable are not pursued further in the present work. Rather an efficient hybrid approach that combines NN and Genetic Algorithm (GA), bringing up the Genetically Evolved Neural Network (GENN) is explored in digital domain. Different hardware platforms like Application Specific Integrated Circuits (ASIC), Digital Signal Processors (DSP) and Field-Programmable Gate Array (FPGA) have been explored for the implementation of ANN in the literature. ASIC-based designs are rigid in nature. DSP-based implementations are sequential in computation and hence might not suit the parallel computing NN. On the other hand, FPGA based implementations not only provide flexibility and reliability but enable parallel architecture realization. FPGA is a programmable device with a large number of generic logic blocks and interconnection blocks. Configuration of an FPGA decides the functionality based on the proper interconnections of the blocks. Re-configurability of FPGA allows the designer to realize the ANN to solve different tasks at hand (Braendler et al. 2002). Complex applications are configured on high density FPGAs, using automated synthesis software.

10 1.4.1 Digital Design of ANN Digital design technique has the following advantages: Simple, flexible and matured fabrication techniques. Re-configurable and adaptable designs. Easy to embed in applications. It is observed that the digital technique is not optimum in area and speed, especially when synapse multiplier is realized. In addition to that, the signals are to be converted to digital where they are analog in nature. Hence the second phase of implementation is executed in mixed-signal design domain. 1.4.2 Mixed-signal Design of ANN Advantages of analog design lie in their lower area and higher speed of operation. In contrast, the digital design scores in terms of programmability. A mixed signal design of ANN, judiciously combines the advantages of both of these. They are characterized by compact and highspeed analog synapse, analog neurons and digitally programmable weight storage. 1.4.3 Pulse-stream based Design of ANN Pulse-stream (PS) representations are widely accepted for implementing ANN in hardware (Murray et al. 1991). PS use quasi-periodic binary signals, which encode all the variables of ANN to bit streams. PS technique can be treated as a form of mixed-signal design technique, since it combines the merits of both analog circuit design processing and digital signal processing. Such an approach eases VLSI implementation (Murray 1992). It enables the intra and inter-chip communications with high noise immunity (Murray et al. 1991). It does not quantize the information; and

11 hence preserves the resolution as maintained in analog computation. Moreover, the communication of binary pulses representing analog information exhibits the advantages of digital signals and circuitry. Applying PS on ANN provides the following advantages over either fully analog or fully digital implementations (Reyneri et al. 1994, Ota and Wilamowski 1996): High noise-immunity. Easiness in re-configurability. Easiness in multiplexing and interfacing. Less susceptibility to process variations between devices. 1.4.4 Genetically Evolved Neural Network (GENN) Designing an optimal architecture to suit specific applications is a critical issue in the field of ANN. For such an implementation, one needs to know the complexity of the problem, number of hidden layers and neurons, convergence capability etc., (Franco and Cannas 2001). Most of the ANN realizations utilize BP algorithm and its variants. They assume a fixed ANN architecture and activation function; hence only weights are learnt. Too small a size deteriorates performance whereas too large a size, results in redundant connections, which increases the area consumption (Leung et al. 2003). Hence a structured procedure adopting, either constructive or pruning approaches (Setiono and Hui 1995, Reed 1993) can be followed. The first method starts with a network of minimal neurons, and adds neurons during training, if necessary. In contrast to this, the second method starts with a large network and deletes insignificant connections, neurons and layers during training. Both the methods consume a long computation time. A better alternative to the constructive or pruning algorithms is Evolutionary Algorithms (EA) as suggested by Miller et al. (1989). It results in an optimal

12 architecture through a search process in the architecture space. A point that satisfies the necessary constraints like simple architecture, least error and fast learning is chosen to be an optimal point. It helps to evolve an optimal architecture. It is logical for researchers to incorporate EA and ANN hoping that both approaches can complement and compensate each other s strength and weakness in tackling the problems (Palmes et al. 2003). It leads to Evolutionary Artificial Neural Network (EANN), which has adaptability to dynamic environment (Yao 1999). EA can broadly be classified as Evolutionary Programming (EP), Evolutionary Systems (ES) and Genetic Algorithms (GA), though many other forms are explored in the recent past (Palmes et al. 2005). GA depends on Crossover operator and the other two depend on Mutation. In this work, GA is chosen because crossover works best when there exists building blocks (Yao 1999). GA is a random search technique that suits complex optimization examples with a large number of parameters (Leung et al. 2003). GA is suitable for problems where gradient information is not available or is costly to obtain or where there is non-differentiable node transfer function involved (Siddique and Tokhi 2001). It is a derivate-free stochastic optimization method, applicable to both continuous and discrete optimization problems. GA adopts the natural phenomenon survival of the fittest, to retain not only the fittest individual but also the fittest genes (Srinivas and Patnaik 1994). 1.5 SCOPE AND OBJECTIVE OF THE WORK The work broadly involves four phases as outlined below: Efficient and flexible design of digital Neurohardwares.

13 Compact and high-speed design of mixed-signal Neurohardwares. Robust and re-programmable design of pulse-density Neurohardwares. Fast, adaptive and optimal design of Genetic Neurohardwares. The digital architecture implemented in the first phase of the work, diagnoses the bladder cancer and breast cancer cells. It follows the idea proposed by Moallemi (1991) with Quick Propagation off-chip learning, to classify the cell images to well and not well. It could be used in the preliminary stage of diagnosis procedure. The design proposed in this thesis, uses the feature extraction procedures to identify whether the cell is normal or cancerous, and thus it diagnoses the disease, which draws attention. Further, only software simulation is reported Moallemi (1991), but FPGA realization has been implemented in this thesis. In order to minimize the resource requirement, sequential processing nature of ANN is exploited (Himavathi et al. 2007). It follows the design principle of realizing only the single largest layer in hardware. Proper design of a control unit, realizes the whole network in a time-multiplexed way, which is chosen as the second architecture of the first phase. Look Up Table (LUT) based Activation Function (AF) is followed by Himavathi et al. (2007). The ANN is realized here with Piece-Wise Linear (PWL) AF to optimize area without much variation in accuracy, which forms a key contribution of the work. Moreover, analytical evaluation of weights obviates the need for separate learning algorithm in software or unit in hardware. Simulation and synthesis of designs, solving different problems, are presented. Implementation results exhibit the advantages and show that the objectives are satisfied. Hardware requirements are compared and speed is measured for different target devices.

14 Second phase of work targets mixed-signal design. The approach followed here is more fundamental in the sense that, the small-signal model of MOSFET is developed first using Simulink tool. It explores the domain of hardware/software co-design (Reyneri 2003). Realizing two different architectures (Ota and Wilamowski 1999, Koosh and Goodman 2002) of analog Synapse and Neuron with digital weights verify its versatility. Modeled MOSFET realizes the pulse-coupled neuron and transconductance neuron of two approaches respectively. This forms the significant part of the second phase. Moreover, the neuron is configured to solve the N-bit parity problem with N/2 neurons in the hidden layer (Wilamowski et al. 2003). In the third phase, different architectures that follow Pulse Density Modulation (PDM) technique are realized. Murray et al. (1991) had pointed out that, firing rate of action potential in biological neurons is categorized as frequency modulation. Hence the PD modulation scheme, which is based on frequency modulation, is utilized for implementation to mime the natural process. The first approach of implementation adopts off-chip BP learning to minimize the hardware. It is based on the architecture (Hikawa 2003b) that overcomes the drawbacks like fixed slope AF and restricted weight range of conventional methods. In order to realize a self-contained system, the second PDM architecture is implemented with on-chip learning. Hence a hardware friendly SP algorithm is chosen, which does not need a complex error propagation mechanism. A single learning unit that modifies all the weights of the network simultaneously, to optimize the hardware further is utilized. The final phase explains the development of Genetically Evolved Neural Network (GENN). Results of two different approaches are presented. Evolved weights are converted to binary stream of data and applied to digital design of GENN. It is to be stressed here that, the genetic evolution of Feed- Forward Neural Network (FFNN), to solve N-bit parity problem, has resulted

15 in the architecture with N/2 neurons in the hidden layer, which is otherwise found to be N neurons (Minnick 1961, Hertz et al. 1991). Thus, an appreciable reduction in hardware can be considered as an important contribution in the field of VLSI realization of ANN. Moreover, a neuron that does not involve a multiplier is designed specifically, to suit the hidden layer of ANN with binary inputs. The functionality is verified after synthesis through different target FPGAs. 1.6 ORGANIZATION OF THE THESIS Different design techniques of ANN realization in VLSI hardware are chosen for analysis in the present work. Implementation details are presented in separate chapters of the thesis. Chapter 1 presents various design approaches and their potential applications. Further, the limitations and scope for exploration are analyzed. Chapter 2 forms a review of the literature by considering various design techniques for study. Four different phases of implementations are discussed in consecutive chapters. The first phase of two different digital implementations on FPGA is presented in Chapter 3. In the first technique, the network perceives the hardware implementation of a novel application - diagnosis of bladder cancer and breast cancer cells. In the second approach, an ANN with layer multiplexing scheme is implemented wherein the PWL-AF with 18-bit fixed-point representation is followed. The scheme follows analytic evaluation of weights to solve XOR and four-bit even-parity generation. Chapter 4 deals with the second phase of implementation using the mixed-signal design technique. Instead of using any conventional SPICE tools, the Simulink tool is used to model the MOSFETs. It is configured to construct two different analog architectures of the synapse, neuron and the complete network. The first design adopts BP algorithm to solve two-bit

16 XOR, character recognition and N-bit parity problems. The second design follows Successive Approximation (SA) algorithm, to realize two-bit XOR and four-bit parity generation and checking architectures. The third phase of implementation based on a Pulse-Density (PD) encoding is presented in Chapter 5. Two different architectures of pulsedensity Neurohardware, one with programmable PWL-AF and the other with on-chip SP learning are realized. Two-bit XOR function and N-bit parity functions are implemented to validate both the architectures. Chapter 6 elaborates the fourth phase of implementation, which follows two approaches for the development of GENN. In the first approach, weight evolution of a fixed architecture is executed. In the second approach, invasive method is used where GA is explored to evolve the network structure and weight. Simultaneous evolution of structure and weight of a GENN solves N-bit Parity using only N/2 hidden layer neurons. Conclusion and scope for further studies are discussed in Chapter 7 by highlighting the contribution towards hardware implementation of ANN. The results of different approaches and architectures are analyzed in respective chapters.