Neural-based Solutions for the Segmentation and Recognition of Difficult Handwritten Words from a Benchmark Database

Neural-based Solutions for the Segmentation and Recognition of Difficult Handwritten Words from a Benchmark Database M. Blumenstein and B. Verma School of Information Technology Griffith University, Gold Coast Campus PMB 50, Gold Coast Mail Centre, QLD 9726, Australia Telephone: +61 7 5594 8738 Fax: +61 7 5594 8066 E-mail: {M.Blumenstein, B.Verma}@gu.edu.au WWW: http://intsun.int.gu.edu.au Abstract A new intelligent segmentation technique is proposed that can be used in conjunction with a neural classifier and a simple lexicon for the recognition of difficult handwritten words. The segmentation technique initially employs a heuristic algorithm which searches for structural features within handwritten word images. As a result, the algorithm over-segments each word. An Artificial Neural Network (ANN) trained with 32,034 segmentation points is then used to verify the validity of the segmentation points found. Following segmentation, character matrices from each word are extracted, normalised and then passed through a global feature extractor after which a second ANN trained with segmented characters is used for classification. These recognised characters are grouped into words and presented to a variable-length lexicon which utilises a string processing algorithm to compare and retrieve words with highest confidences. This research provides promising results for segmentation, character and word recognition. In particular, the results obtained from the segmentation of handwritten words contained on the CEDAR CD-ROM can be used for comparison with other researchers using the same benchmark database. Although many segmentation techniques can be found in the literature, only a small number report segmentation accuracies of word samples from benchmark databases. Keywords: Segmentation, Artificial Neural Networks, Handwriting Recognition, Character Recognition, OCR. 1. Introduction The literature has been inundated with research detailing new techniques for the classification of handwritten numerals. In particular many researchers have obtained very impressive results using neural techniques [1-5]. Unfortunately, satisfactory results have not yet been obtained for the segmentation and recognition of handwritten words. Researchers have utilised many different approaches for both the segmentation and recognition tasks of word recognition, often combining the two in a hybrid or tightly coupled system. Some researchers have used conventional, heuristic techniques for both character segmentation and recognition [6, 7] while others have used heuristic techniques for segmentation followed by ANN based methods for the character/word recognition process [8, 9]. However, there have only been a handful of researchers using ANNs for the segmentation of printed and cursive handwriting followed by the subsequent recognition of characters [10, 11]. Even fewer have detailed their findings for the segmentation process of their system. Most researchers tend to measure the success of their system by their findings from the character or word recognition phases. As is mentioned in numerous references, [12-14] segmentation plays an important role in the overall process of handwriting recognition. There is

still a need to compare results for the segmentation of handwriting using benchmark databases. Cursive word segmentation deserves particular attention as it has been acknowledged as the most difficult of all handwriting segmentation problems [15]. This research attempts to integrate both heuristic and intelligent methods for the segmentation of cursive and printed handwritten words. For the initial task of segmentation, a feature-based heuristic technique is used to locate prospective segmentation points in handwritten words. An Artificial Neural Network trained with valid segmentation points from a database of handwritten words is used to assess the correctness of the segmentation points found by the algorithm. Following segmentation, character matrices from the handwritten words are processed and extracted, so that they may be recognised by a second ANN trained with segmented handwritten characters. Finally, to show how the segmentation technique may possibly be used in the context of an overall system, a lexicon is used to match each set of segmented handwritten characters (each set represents a single word) to potential correct words. The entire system is shown in Figure 1. The remainder of the paper is broken down into 6 sections. Section 2 describes the proposed segmentation technique, Section 3 discusses the character recognition phase, the lexicon used for word recognition is explained in Section 4, Section 5 provides experimental results, a discussion of the results takes place in Section 6 and a conclusion is drawn in Section 7. 2. Proposed Segmentation Technique Figure 1. Complete Handwriting Recognition System This section addresses the steps required to segment the handwritten words using the proposed technique. An overview of the technique is provided in Figure 2. 1

(a) (b) Figure 2. Proposed Segmentation Technique (a) Stage 1:Training Phase (b) Stage 2: Testing Phase 2.1 Preprocessing Prior to segmentation and recognition, it was necessary to preprocess all word images. Initially the images were all in a grey-level format. Otsu s thresholding algorithm [16] was used to binarise the images. Many of the cursive and even some of the printed words were slanted at various angles, it was therefore necessary to employ a slant detection and correction technique [6], before the segmentation technique could be applied. 2.2 Overview of the Heuristic Algorithm For both training and recognition phases, an heuristic feature detection algorithm is used to locate prospective segmentation points in handwritten words. Each word is inspected in an attempt to locate characteristics representative of segmentation points. Firstly the average character width of the word is estimated by finding segregated characters in the text and calculating their average width. If no segregated characters are found within a particular word, average word height is used to calculate an approximate character width estimate. Next upper and lower word contours are determined to enable the location of upper and lower minimas in the word (possible ligatures in cursive writing). A histogram of vertical pixel density is then 2

calculated to further confirm the location of possible segmentation points in the word. Words are also scanned for possible holes i.e. areas in a word that may be occupied by an o, a, b etc. These regions are marked as being inappropriate to accommodate possible segmentation points. Finally, the word is scanned to examine whether segmentation points have been properly distributed throughout the word. Clusters of proximate segmentation points are analysed and are reduced in number so that only small collections of more likely points representing a particular area may exist. Finally, areas in a word which are lacking segmentation points i.e. if an area with a width larger than that of the calculated average character width has a sparse distribution of segmentation points a segmentation point is forced in the most likely area of the word segment. The result is a set of over-segmented words which await ANN verification. (a) (b) (c) Figure 3. Some of the steps in heuristic segmentation: (a) Minimas found in each word indicating possible ligatures (b) Vertical pixel density histogram for a word (c) Location of holes in a word 2.3 The Training Phase of the Segmentation Technique Prior to ANN training, the heuristic feature detector is used to segment all words that shall be required for the training process. The segmentation points output by the heuristic feature detector are manually analysed so that the x coordinates can be categorised into correct and incorrect segmentation point classes. For each segmentation point in a particular word (given by its x- coordinate), a matrix of pixels is extracted and stored in an ANN training file. Each matrix is first normalised in size, and then significantly reduced in size by a simple feature extractor. The feature extractor breaks the segmentation point matrix down into small windows of equal size and analyses the density of black and white pixels. Therefore, instead of presenting the raw pixel values of the segmentation points to the ANN, only the densities of each window are presented. As an example, if a window exists which is 3x3 in dimension, and contains 7 black pixels, then a single value of 0.78 (Number of pixels/9) is written to the training file to represent the value of the window. Accompanying each matrix the desired output is also stored in the training file (0.1 for an incorrect segmentation point and 0.9 for a correct point) ready for ANN training. 2.4 The Testing Phase of the Segmentation Technique Following ANN training, the words used for testing are also segmented using the heuristic feature-based algorithm. This time there is no manual processing. The segmentation points are automatically extracted and are fed into the trained ANN. The ANN then verifies which segmentation points are correct and which are incorrect. Finally, upon ANN verification, each word used for testing should only contain valid segmentation points, which can then be used for further processing. 3. The Recognition of Segmented Characters 3

Another area of the handwriting recognition domain that has not received sufficient attention is the comparison of researchers results for segmented character recognition utilising benchmark handwritten word databases. Following the technique described in Section 2, character segments were extracted from each word and then recognised by a classifier. Using the segmentation points generated for training in Section 2.3, segmented characters were extracted to train a backpropagation neural network. The extracted characters were first normalised and then reduced in size by the global feature extraction technique also detailed in Section 2.3. and Figure 4. Characters used for testing were extracted using the same procedure. Following neural network training, segmented test characters were passed through the ANN and were classified. Figure 4. A window of 4x4 in dimension is extracted from a character matrix 4. Recognition of Words Using a Simple Lexicon A variable sized lexicon of words was implemented to recognise all words used for testing. The lexicon was solely implemented to indicate how the segmentation technique could be used as part of a fully operational handwriting system. It must be noted therefore that our research was mainly focussed on producing an accurate segmentation component, not to produce a highly accurate word recogniser. Each recognised character set from the previous section (representing a single word), was used to test the lexicon. The lexicon used a simple string comparison algorithm, which first matched each character of each lexicon word to the characters in the test word being examined. The number of correct characters was noted. In further processing, information such as the order of the characters found in each test word and the length of each test word, were compared to those of all lexicon words. Each word in the lexicon was given a confidence rating for every test word depending on the number of matching characters found and the number of characters that appeared in the correct sequence: See Figure 5 below. 4

5. Experimental Results Figure 5. A test word being matched to a lexicon of words For experimentation of the techniques detailed in Sections 2 to 4, we used samples of handwritten words from the CEDAR benchmark database [17]. In particular we used all the words contained in the BD/cities directory of the CD-ROM. 5.1 Segmentation Results All segmentation experiments were conducted using an ANN trained with the backpropagation algorithm. Table 1 shows the top experimental results for the verification of segmentation points by the ANN. Many experiments were performed varying settings such as the number of iterations, the number of hidden units, alpha and eta. For each experiment the number of inputs remained the same: a 14x3 matrix of pixel densities (42 inputs). The number of outputs was always set to 1. Table 1 shows the top results obtained when the ANN was trained with 32,034 training patterns (correct and incorrect segmentation points). The number of testing patterns was 3162. Table 1 Segmentation point results using 32034 training patterns Iterations Hidden η α Classification Rate For Test Classification Rate Units Set [%] Test Set 200 25 0.1 0.1 2568/3162 81.21 400 30 0.1 0.1 2566/3162 81.15 500 30 0.1 0.1 2562/3162 81.02 400 20 0.1 0.1 2559/3162 80.93 5

5.2 Segmented Character Recognition Results The character recognition experiments were also conducted using a backpropagation neural network. The number of characters used for training and testing respectively were 15297 and 1212. Each character was normalised to a matrix size of 20x20. The global feature extraction method detailed in Section 2.3 was then used to reduce inputs to the neural network even further. A matrix of dimension 5x5 (25 inputs) was the final result. The number of outputs was 52 representing 26 uppercase characters (A-Z) and 26 lowercase characters (a-z). As with the segmentation point recognition experiments, various settings were examined to obtain the best classification rate. The results obtained for character recognition are presented in Table 2 and are divided into two categories. Results are presented for experiments which distinguished and which did not distinguish between uppercase and lowercase characters. Case Sensitive Experiments Non-Case Sensitive Experiments Table 2 Character recognition results using 15297 training patterns Iterations Hidden η α Classification Classification Rate Units Rate For Test [%] Test Set 5.3 Word Recognition Using a Lexicon Set 700 100 0.1 0.1 680/1212 56.11 600 100 0.1 0.1 674/1212 55.61 400 90 0.1 0.1 666/1212 54.95 300 90 0.1 0.1 665/1212 54.87 700 100 0.1 0.1 709/1212 58.50 600 100 0.1 0.1 703/1212 58.00 400 90 0.1 0.1 702/1212 57.92 300 90 0.1 0.1 697/1212 57.51 Following character recognition, sets of characters comprising words were presented to lexicons of size 10, 50 and 100 words. Word test sets of size 40, 148 and 211 were presented to the lexicon. Both words contained in the lexicon and words used for testing were randomly selected for the experiments. Top word recognition results for each lexicon size are presented below in Table 3. The value N ranges between 2 and 10, and indicates whether the correct word was located in the top 2, 5, or 10 choices suggested by the lexicon. 6. Discussion of Results Table 3 Word recognition results Lexicon Recognition Rate for top N choices Size N=2 N=5 N=10 10 100% 100% N/A 50 66.67% 71.43% 85.71% 100 50.00% 65.00% 70.00% The following sections discuss the experimental results obtained in the previous sections. The first section discusses the results produced by the segmentation phase of our system. Next the character recognition component is discussed. Finally, word recognition results are explained. 6.1 Classification of Segmentation Points 6

In this research, the most important component was the segmentation stage. Many researchers mention segmentation as a part of their overall systems, however few report their findings at the segmentation level. This research has focussed on this very important area and has produced commendable results which can easily be compared to other researchers in the field. The neuro-heuristic algorithm obtained results of up to 81.21% for 3162 testing segmentation point patterns. Eastwood et al. [11] presented an ANN-based method for the segmentation of cursive and printed handwriting from the CEDAR CD-ROM, detailing a segmentation accuracy of 75.9%. Han and Sethi [18], achieved an 85.7% accuracy using an heuristic algorithm for the segmentation of words on 50 envelopes from real mailpieces. Finally, Yanikoglu and Sandon [19] reported that 97% of letter boundaries from 750 words were correctly located. Although Yanikoglu and Sandon s results are significantly higher than the other results presented here, it must be noted that they did not use a benchmark database of realworld unconstrained words for their experiments. The results for segmentation achieved in this research compare favourably with other researchers. 6.2 Classification of Segmented Characters The problem of segmented character recognition is a difficult one to resolve. Results obtained by researchers are still not as high as those previously obtained for handwritten numeral recognition. Top researchers [20-22] have obtained results ranging from 67-80% classification rates on samples from the CEDAR CD-ROM. The experimental parameters, procedures and size of word samples for training and testing closest to those described in this research, is that of Yamada et al. [22]. Their results for case sensitive segmented character recognition was 67.8%. The top result presented in Table 2 is just above 56%. Our results are slightly lower, however it is important to note that in their research, Yamada et al. used more training samples, and long recognition times were recorded due to the algorithms used. Following ANN training, the classifier in our research recognised over 1000 characters in a few seconds. Therefore, taking into account factors such as speed and simplicity our classification method has also generated favourable results. 6.3 Overall Word Recognition The results obtained for overall word recognition were not significantly high. The recognition rates were only high for the smallest lexicon of words, however as the lexicon increased in size the recognition rate dropped suddenly. For lexicons of size 50 and 100, the recognition rates for the top 2 to 10 choices were reasonable, however top choices were quite low. We can attribute the errors in previous stages to these low recognition rates. In the early stages of character segmentation and recognition, some characters were not segmented properly and therefore were rejected. Even though segmentation rates were high, some words presented to the lexicon were incomplete. This made it very difficult for the string processing algorithm to match and locate the correct words. Also, the main aim of our research was to present a new segmentation technique, recognition at the word level was not given significant attention. Improvements in the early stages of our system right up to the word recognition stage are required to improve overall word recognition results. 6.4 Future Research In further research, all areas of the current system shall be targeted for improvement to increase classification rates. Firstly, the heuristic component of the segmentation system will need to be refined further. The main problem lies with under-segmentation. The heuristic algorithm was 7

designed to keep incorrect segmentations to a minimum so that when ANN validation took place, less errors could be made. However, the trade-off for this was that some segmentation points were missed. Therefore the algorithm will be modified to find more prospective segmentation points. This can be achieved by looking for more features or possibly enhancing the current feature detection methods. In particular, a postprocesser shall be added to the segmentation phase. Before characters are extracted, the postprocessor will be required to detect substantially difficult segmentation points usually found when either large uppercase characters cross over into regions occupied by lowercase characters or when two characters are tightly coupled. The character recogniser shall be enhanced by using improved feature extraction techniques. A global feature extraction technique is fast and easy to implement, but does not produce exemplary results. Finally, the lexicon will also need to be updated. Different options such as Dynamic Programming shall be explored to tackle the challenging problem of matching incomplete words to those in the lexicon. However, if the previous two stages can be improved substantially, word recognition results should be much higher. 7. Conclusion An intelligent segmentation technique has been presented in this paper, producing good results. It was used to segment difficult cursive and printed handwritten words from the CEDAR database. A segmented character recogniser has also been presented as part of an overall handwriting recognition system. Considering the speed and simplicity of the system, our results for character recognition and word recognition are favourable. Implementing the improvements discussed in the previous section should boost the character and word recognition rates significantly. However, the main focus of the research presented in this paper was the segmentation of handwritten words. It has been noted that there are very few researchers that have published their segmentation results for handwritten word recognition when discussing a complete system. Therefore it is hoped that further research can be dedicated to analysing and improving the results of this very important procedure. References [1] C. Y. Suen, and R. Legault, C. Nadal, M. Cheriet, and L. Lam, Building a New Generation of Handwriting Recognition Systems, Pattern Recognition Letters, Vol. 14, 1993, pp. 305-315. [2] S-W. Lee, Multilayer Cluster Neural Network for Totally Unconstrained Handwritten Numeral Recognition, Neural Networks, Vol. 8, 1995, pp. 783-792. [3] H. I. Avi-Itzhak, T. A. Diep, and H. Garland, High Accuracy Optical Character Recognition using Neural Networks with Centroid Dithering, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 17, 1995, pp. 218-224. [4] S-W. Lee, Off-Line Recognition of Totally Unconstrained Handwritten Numerals Using Multilayer Cluster Neural Network, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, 1996, pp. 648-652. [5] S-B. Cho, Neural-Network Classifiers for Recognizing Totally Unconstrained Handwritten Numerals, IEEE Trans. on Neural Networks, Vol. 8, 1997, pp. 43-53. [6] R. M. Bozinovic, and S. N. Srihari, Off-Line Cursive Script Word Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 11, 1989, pp. 68-83. [7]. N.W. Strathy, C.Y. Suen, and A. Krzyzak, Segmentation of Handwritten Digits using Contour Features, ICDAR 93, 1993, pp. 577-580 [8] B. A. Yanikoglu, and P. A. Sandon, Off-line cursive handwriting recognition using style parameters, Tech. Report PCS-TR93-192, Dartmouth College, NH., 1993. [9] J-H. Chiang, A Hybrid Neural Model in Handwritten Word Recognition, Neural Networks, Vol. 11, 1998, pp. 337-346. 8

[10] G. L. Martin, M. Rashid, and J. A. Pittman, Integrated Segmentation and Recognition through Exhaustive Scans or Learned Saccadic Jumps, Int l J. Pattern Recognition and Artificial Intelligence, Vol. 7, 1993, pp. 831-847. [11] B. Eastwood, A. Jennings, and A. Harvey, A Feature Based Neural Network Segmenter for Handwritten Words, Int l Conf. Computational Intelligence and Multimedia Applications, Gold Coast, Australia, 1997, pp. 286-290. [12] S. N. Srihari, Recognition of Handwritten and Machine-printed Text for Postal Address Interpretation, Pattern Recognition Letters, Vol. 14, 1993, pp. 291-302. [13] M. Gilloux, Research into the New Generation of Character and Mailing Address Recognition Systems at the French Post Office Research Center, Pattern Recognition Letters, Vol. 14, 1993, pp. 267-276. [14] R. G. Casey and E. Lecolinet, A Survey of Methods and Strategies in Character Segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, 1996, pp. 690-706. [15] Y. Lu, M. Shridhar, Character Segmentation in Handwritten Words An Overview, Pattern Recognition, Vol. 29, 1996, pp. 77-96. [16] N. Otsu, A threshold selection method from gray level histograms, IEEE Trans. Systems, Man and Cybernetics, Vol SMC-9, 1979, pp. 62-66. [17] J. J. Hull, A Database for Handwritten Text Recognition, IEEE Transactions of Pattern Analysis and Machine Intelligence, Vol. 16, 1994, pp. 550-554. [18] K. Han, I. K. Sethi, Off-line Cursive Handwriting Segmentation, ICDAR 95, Montreal, Canada, 1995, pp. 894-897. [19] B. Yanikoglu, P. A. Sandon, Segmentation of Off-line Cursive Handwriting using Linear Programming, Pattern Recognition, Vol. 31, 1998, pp. 1825-1833. [20] P. D. Gader, M. Mohamed and J-H. Chiang, Handwritten Word Recognition with Character and Inter- Character Neural Networks, IEEE Trans. On System, Man, and Cybernetics-Part B: Cybernetics, Vol. 27, 1997, pp. 158-164. [21] F. Kimura, N. Kayahara, Y. Miyake, and M. Shridhar, Machine and Human Recognition of Segmented Characters from Handwritten Words, ICDAR 97, Ulm, Germany, 1997, pp. 866-869. [22] H. Yamada and Y. Nakano, Cursive Handwritten Word Recognition Using Multiple Segmentation Determined by Contour Analysis, IEICE Trans. On Information and Systems, Vol. E79-D, 1996, pp. 464-470. 9