Application of Neural Networks on Cursive Text Recognition Dr. HABIB GORAINE School of Computer Science University of Westminster Watford Road, Northwick Park, Harrow HA1 3TP, London UNITED KINGDOM Abstract: - This paper describes an Arabic text recognition system based on neural networks. Text is input using a scanner and some pre-processing is done on it to separate lines and words. A processing stage is then applied to each image word and thinning, stroke segmentation and feature extraction performed. Following this, strokes are classified into eleven primitive using three-layer neural networks, which is trained on back propagation algorithm. In the recognition stage the primitives, with some features, are classified into characters or part of a character. A secondary stage combines primitives into characters into Arabic characters and solves ambiguities between pairs and triplets of characters. Key-Words:- Pattern recognition, Arabic character recognition, artificial neural networks, back propagation, skeleton, segmentation. 1. Introduction If you ever find yourself wasting valuable time keying pages of typewritten text, computer printout, faxed documents or newspaper articles then you are doing things the hard way. For Latin characters, there are several good commercial recognition packages that can ease the burden of getting text off the paper into your computer. What about Arabic characters? Arabic character recognition presents a real challenge due to the nature of Arabic writing being cursive. In the past decade there has been increasing research towards Arabic character recognition [1,2,3]. This field is important, not only for Arabic speaking countries, but also for Persian and Urdu-speaking Indian, which have similar character sets. In this paper, a text recognition system is presented which is based on previous research [8], in which an Arabic text is scanned; lines and words are separated at high speed. A processing stage is then applied to this image word and thinning, stroke separation and feature extraction performed. Following this, strokes are classified into eleven primitives using three-layer neural networks, which is trained on back propagation algorithm. In the recognition unit the primitives, with some features are classified into characters or part of a character. A secondary classifier combines strokes into characters and solves ambiguities between pairs and triplets of characters. The steps involved in the process are shown in figure 1 and described in the following sections. 1
Data acquisition: Scanning-in a page of text Pre-processing: Text line separation Word separation Processing: Thinning Stroke separation Sampling Stroke representation Stroke classification: Neural network Character classification: Features 1.1 Separating Word After lines have been separated, a method is applied to separate the words of each line. Unlike English cursive script, an Arabic word could be composed of one or more parts separated by a blank space. For this reason, the separation of words within the lines is more complex. The system must distinguish between the gaps in the vertical histogram of each line indicating letter boundaries of the same word and those indicating word boundaries. In order to separate words, each line is scanned from right to left and the width of the gaps is determined. A word is separated if the gap exceeds a certain threshold fixed experimentally. 3. Processing In order to recognise an Arabic text, a processing stage is applied which consists of the steps to prepare an image word for recognition. It is an important part in the recognition system. The processing involves three main steps; thinning, stroke segmentation, sampling and stroke representation. Figure 2, shows the results of the processing stage. Figure 1: stages of the Arabic Recognition system. 2. Preprocessing In order to isolate words from the text, first of all, lines are separated and then each line is separated into words. 2.1 Separating lines The horizontal histogram of the image text is computed. The valleys found indicate the gaps between lines, because printed Arabic text is written horizontally and empty lines separate lines from each other. The maxima represent the base lines and the minima indicate the interline marker [8]. 3.1 Thinning The aim of the thinning process is to reduce the image word into a thin-line. This gives the possibility of creating dynamic information, such as the strokes sequence from static images. Hilditch s method [5] combined with the unification of junctions [9] proved to give very good results and facilitated the stroke segmentation procedure. 3.2 Stroke segmentation The aim of stroke segmentation technique is to create artificial time information from a static image (such as the time sequence of a pen directions), and reduce the number of strokes, which proved to be very efficient in recombining them into characters. The stroke segmentation method consists of breaking down an Arabic word into principal strokes, which are strings of coordinates and 2
secondary strokes that are additions to the principal ones. The segmentation algorithm is given in detail in a previous paper [2]. It consists of three distinct, sequentially applied steps: i. Identifying the start-point of the stroke. ii. Identifying the end-point of the stroke. iii. Tracing the stroke from the start-point to the end-point of the stroke. Figure 2 shows the strokes obtained from the segmentation process of an Arabic word. 3.3 Smoothing and sampling This process serves as a filter that eliminates redundant points and retains the minimum number of points needed to recognise characters. The filter consists of a sampling algorithm which is based on angular segmentation [4]. The algorithm imposes the condition that the direction of the curve between two consecutive sampled points does not exceed a certain threshold angle. Figure 2 shows an example of the sampling process in which only a minimum number of points are kept. 3.4 Stroke representation In order to feed all the pixels to the network, each consecutive sampled point are represented by a segment. Hence a string of segment s direction will represent each stroke. Each stroke is represented by a string of angles. Each angle is computed from two consecutive points and later normalised to act as an input vector to the neural network. 4. Stroke classification The back-propagation neural network architecture with sigmoid transfer function is shown in figure 3.1 and figure 3.2. This model has three layers; an input layer, an output layer, and a layer in between called the hidden layer. Each unit in the hidden layer and the output layer is like a perceptron unit. The units in the input layer serve to distribute the values they receive to the next layer. The learning rule for multiplayer perceptron is called the generalised delta rule, or the backpropagation rule, and was suggested in 1986 by Rumelhart McClelland, and William [10]. The operation of the network is done by showing the net a pattern and calculates its response. Comparison with the desired response enables the weights to be altered so that the network can produce a more accurate output next time. The learning rule provides the method for adjusting the weights in the network. When an input pattern is presented to the untrained network, it will produce any random output. An error function that represents the 3
Input Hidden Output position, and the presence of secondary strokes. Finally, some strokes are combined into characters and ambiguities are resolved between pairs of characters using geometrical measurements on the character and layout context that covers base line information and the location of one character with respect to its neighbours. Figure-3.1 The Multilayer perceptron 1 y 0 x Figure-3.2. Sigmoid transfer function difference between the current output and the desired output is computed. In order to learn successfully we want to make the output of the net converge towards the desired output. This achieved by adjusting the weights on the link between the units. The back-propagation network has separate stages for learning and operation. Once the network has been trained, the learning process is stopped, and connection weights are fixed. Each stroke is presented to the neural network as a string of normalised angles ready to be processed by the neural network. The neural network consists of three layers; nine input neurons, four neurons in the hidden layer and eleven neurons in the output. 5. Recognition Stage Recognition is achieved in two stages. The first stage is classifying each stroke into one of the eleven primitive using three layer backpropagation neural network and the second stage consists of a description of each character in the form of vector features. The features used for character classification are the position of the stroke in the word, the shape of the stroke, which could be one of the eleven primitives, the existence of a loop, the number of dots and their 6. Experimental results In order to make a comparison with the previous recognition system, the same data was used. It consists of a training set of 60 printed Arabic words written in naskhi font, words being written in a horizontal line and not slanted. The test data consisted of another set of 60 words of the same font. These same 60 words were tested in three different sizes (large, medium, small), making in all about 300 characters. The recognition rate was higher compared to the one from previous research [8]. 7. Conclusion In this paper a new method of Arabic character recognition system is developed based on a previous research. A back propagation network is used to classify strokes into one of the eleven primitive combined with some features in order to recognise Arabic characters. The main idea is working, and further research will be based on a cluster of neural network for the whole system. References: [1] Gasser Auda and Hazem Raafat, An Automatic Text Reader Using Neural Networks, IEEE, March 1993, pp.92-95. [2] H. Goraine, M. Usher, and S. Al- Emami, Offline Arabic Character Recognition of Isolated Arabic Words, for the issue of IEEE Computer System analysis, June 1992. [3] H. Al-Mualim and S.Yamaguchi, A Method of Recognition of Arabic Cursive Handwriting, IEEE Trans. On Pattern Analysis and Machine Intelligence Vol.9, No.9, pp.715-722, September-1987. [4] M. Berthod, Experimentations sur l echantillonnage de traces manuscripts en temps reel, Congres AFCET-IRIA, traitement des 4
images et reconnaissance des formes, Gif sur Yvette, Fevrier 1978. [5] Hilditch C.J., Linear skeletons from square cupboards, in Machine Intelligence 4, 1969, pp.403-420. [6] Hussain B. and Kabuka M.R., A Novel Feature Recognition and its application to Character Recognition, IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 16, No. 1, January 1994, pp.98-106. [7] S. Knerr, L. Personnaz, and G. Dreyfus, Handwritten Digit Recognition by Neural Networks with Single-Layer Training, in IEEE Transactions On Neural Networks, Vol. 3, No. 6, November 1992, pp.962-968. [8] H.Goraine and M.J.Usher, Printed Arabic Text Recognition, ICEMCO, London, October 1994. [9] S.Al-Emami, Recognition of Handwritten and Typewritten Arabic Characters, PhD thesis, University of Reading, Department of Cybernetics, September 1988. [10] D.E.Rumelhart, G.E.Hinton, and R.J.Williams, Learning internal representations by error propagation, in Parallel Distributed Processing. Cambridge, MA: MIT Press, 1986, vol. 1. 5