Verification of Unconstrained Handwritten Words at Character Level

2010 12th International Conference on Frontiers in Handwriting Recognition Verification of Unconstrained Handwritten Words at Character Level Alessandro L. Koerich Dept. of Computer Science PUCPR Dept. of Electrical Engineering UFPR Curitiba, PR, Brazil alekoe@computer.org Alceu de S. Britto Jr. Dept. of Computer Science PUCPR Curitiba, PR, Brazil alceu@ppgia.pucpr.br Luiz Eduardo S. de Oliveira Dept. of Computer Science UFPR Curitiba, PR, Brazil lesoliveira@inf.ufpr.br Abstract In this paper we present a verification module that has as input the output provided by a word recognizer which is based on the segmentation-recognition paradigm. The word recognizer models words as the concatenation of character hidden Markov models (HMMs) and it provides at the output a list with the Top N best word hypotheses, including their likelihoods and the segmentation points of the words into subwords, which ideally should be characters. The verification module uses the segmentation points provided by the word recognizer for each word hypothesis to extract different features from each subword. A classifier based on a multilayer perceptron neural network assigns a character class (A-Z) and estimates the a posteriori probability to each subword that make up a word. Further, both the character class and the a posteriori probabilities are combined with the original output of the word recognizer to re-rank the word hypothesis into the Top N list. Experimental results show that the verification module improves the Top 1 recognition rate in 3.9% for an 85,092-word recognition task. Keywords-Verification; Character recognition; Word recognition. I. INTRODUCTION During the last few years, hidden Markov models (HMMs) have become a very popular approach in handwriting recognition. One of the main reasons is their high performance in medium to large vocabulary applications where segmentation-recognition methods are used to cope with the difficulties of segmenting words into characters. Segmentation-recognition methods first loosely segment (oversegment) words into graphemes that ideally consist of either characters or parts of characters, and use dynamic programming techniques together with a lexicon to find the definitive segmentation as well as the best word hypotheses [1], [2], [3], [4], [5]. Many handwriting recognition systems use HMMs to model sub-word units (characters) and the Viterbi algorithm to find the best match between a sequence of observations and the models [1], [2], [4]. The Viterbi algorithm is optimal in the sense of maximum likelihood and it looks at the match of the whole sequence of features (observations) before deciding on the most likely state sequence. This is particularly valuable in applications such as handwritten word recognition where an intermediate character may be garbled or lost, but the overall sense of the word may be detectable. On the other hand, the local information is somewhat overlooked in such an approach. Furthermore, the conditional-independence imposed by the Markov Model (each observation is independent of its neighbors) prevents an HMM from taking full advantage of the correlation that exists among the observations of a single character [6]. In this paper we propose a verification module that attempts to overcome such a deficiency of the Viterbi algorithm in considering local information by carrying out recognition at character level in a post-processing procedure. The proposed verification module uses the segmentation information provided by the word recognizer to return to the input word image and, for each segment, that is supposed to represent a whole character, extract another set of features more suitable for isolated character recognition. This feature set is used to train a multilayer perceptron (MLP) neural network classifier [7]. Further, the results of the character classifier are used to re-rank the list of Top N best word hypotheses provided by the word recognizer. The focus of this paper is on the integration of the results of the isolated character classifier and the word recognizer. The organization of this paper is as follows. Section 2 presents the large vocabulary off-line word handwritten word recognizer. Section 3 presents the isolated handwritten character classifier. The verification of words at character level is presented in Section 4. Section 5 shows the experimental results of the recognition-verification scheme. Finally, the concluding remarks are presented in the last section. II. WORD RECOGNITION The baseline handwritten word recognition system is composed of several modules: pre-processing, segmentation, feature extraction, training and recognition. The pre-processing normalizes the word images in terms of slant and size. After, the images are segmented into graphemes and the sequence of segments is transformed into a sequence of symbols (or features). There is a set of seventy classes among letters (26 uppercase and 26 lowercase), digits (10) and special symbols (8) that are modeled by 10-state transition-based HMMs with forward and null transitions [2]. The character HMMs were trained and validated on a set of 12,049 and 3,470 words respectively, by using the maximum likelihood criterion and through the Baum-Welch algorithm. 978-0-7695-4221-8/10 $26.00 2010 IEEE DOI 10.1109/ICFHR.2010.14 39

The word recognizer perform a huge task of classifying a pattern to one of the possible G classes, where G corresponds to the size of the lexicon. The lexicon has 85,092 entries representing city names. Single and compound words such as Chire en Montreiul are present in lexicon. The average length of the words is 12.12 characters, and the shortest and the longest word have two and fifty six characters respectively. Recognition is carried out by a lexicondriven search based on a two-pass Viterbi-like decoding algorithm [4]. Figure 1 shows the average performance of the word recognizer on a testing set with 4,674 images for different sizes of lexicon. The results are shown for the Top 1, Top 2, Top 5, and Top 10 best choices. The output of the word recognizer is a list with the Top N best word hypotheses ranked according to the a posteriori probability assigned to each word hypothesis. Furthermore, the information of the segmentation of the word hypotheses into subwords is also available and it is obtained by backtracking the best state sequence. Therefore, the output of the word recognizer, denoted as Ψ, can be represented by a triple: Ψ = {L n, S n, P n } (1) where L n is the label for the n-th word hypothesis and it is given as a sequence of ASCII characters l1 n l2 n... lh n, S n = s n 1 s n 2... s n H is a set of segments that correspond to the segmentation of the n-th word hypothesis into subwords, and P n is the a posteriori probability assigned to the n-th word hypothesis by the classifier. H is the number of characters and s n h is the segment that represents the h-th word character from the n-th word hypothesis. III. CHARACTER RECOGNITION Handwritten character recognition has been the subject of much attention in the field of handwriting recognition. Several proposals to solve this problem have been presented throughout the last decade [7], [8], [9], [10], [11], [12]. However, most of the research efforts in recognition of characters have been focused on the recognition of digits due to the reduced number of classes when compared with letters. Even if these two problems are very similar, the recognition of letters is a much more difficult task due to the number of classes (up to 52) and the ambiguity between characters of different classes. To build an unconstrained character recognizer we have to define some features to build feature vectors representing the characters and a classifier. The feature set adopted was previously validated on the NIST database and it consists of global (projections and profiles) and local (directional histogram) features. The former were extracted from the whole character, while the later was extracted on a 3 x 3 grid. The combination of the features yields to a 108-dimensional feature vector. More details on the unconstrained character recognizer can be founf in [7]. Neural network classifiers exhibit powerful discriminative properties and they have been used in handwriting recognition particularly with digits, isolated characters, and words in small vocabularies. However, the use of neural networks in the recognition of handwritten words from larger vocabularies depends heavily on a very efficient segmentation scheme. Due to the lack of such an efficient segmentation scheme, neural networks are usually employed in combination with other classifiers, e.g. hybrid NN/HMM approaches that use neural networks to estimate a priori probabilities [5], or used to validate grapheme hypotheses generated by HMM classifiers [13]. Besides that, the choice of a multilayer perceptron (MLP) as the classifier to perform the character recognition task was determined by several constraints such as: recognition speed, unbalanced distribution of samples per class, very few samples for some classes, etc. Even though some of the requirements are not fulfilled, a multilayer perceptron neural network was chosen as the classifier for the following reasons: it is fast, it has a powerful nonlinear decision capability, it is easy to implement, it generalizes well, and it estimates Bayesian a posteriori probabilities at the output. The architecture of the network, which has 108 neurons at the input layer, 90 at the hidden layer and 26 at the output layer, was determined in an exploratory study which indicated that better classification results are achieved when uppercase and lowercase versions of the same characters are merged into single classes. Table I shows the number of samples and recognition rates for the three datasets of NIST SD19 and a proprietary database. From the hsf0, hsf1, hsf2, and hsf3 sets of the NIST database 1,660 samples per character class (uppercase A-Z) and 1,440 samples per character class (lowercase a-z) were taken randomly for training the neural classifier using the backpropagation algorithm. From the hsf7 set was generated the feature vector used as validation sets during the training procedure to watch over the generalization and to stop the training at the minimum of the error. From the hsf4 set was generated the feature vector to test the performance of the classifier. The proprietary character database was built automatically from the three datasets described in the previous section. For the training and validation sets, we have selected the words correctly recognized in an 85,092-word recognition task, and based on the segmentation of such words into characters given by the backtracking of the Viterbi algorithm, segments representing characters were recovered. It is important to mention that for the experiments reported in this paper, no cleaning or visual inspection of the resulting character quality was done. Therefore, the character database may contain some garbled shapes that actually do not represent characters. The best character hypothesis at the output of the character recognizer is the character class which provides the highest a posteriori probability. Therefore, the output of the 40

Figure 1. Performance of the handwritten word recognizer for different lexicon sizes. Table I SIZE OF THE DATASETS AND RESULTS FOR THE RECOGNITION OF ISOLATED HANDWRITTEN CHARACTERS FOR THE NIST AND THE PROPRIETARY DATABASE. NIST Proprietary Dataset # of Recog. # of Recog. Samples Rate (%) Samples Rate (%) Training 80,600 94.71 84,811 76.48 Validation 23,670 89.98 27,282 73.54 Testing 16,900 88.10 34,731 73.51 unconstrained character recognizer, denoted as Γ, can be represented by: Γ = {C n l, P n l } (2) where Cl n is the class of the l-th character from the n-th word hypothesis, and it is given as an ASCII character, and Pl n is the a posteriori probability assigned to the character class by the MLP neural network classifier. IV. VERIFICATION OF WORD CHARACTERS Now, our interest is in combining the information provided by both classifiers, that is, the a posteriori probability of each word hypothesis provided by the word recognizer, and the classes and a posteriori probabilities of each character that make up the word hypothesis, which is provided by the unconstrained character recognizer. The methods that can be used to combine multiple classifier decisions depend on the types of information produced by the individual classifier. The word recognizer produces at the output a ranked list of classes (words) with attached a posteriori probabilities. The output of the character classifier is the class C l that best matches the input data together with the a posteriori probability P l. Therefore, the problem of verification is to integrate these outputs in a convenient manner to improve the final recognition accuracy. However, due to the uncertainty in the correct segmentation of words into characters, we restrain the verification to the first characters of the words. We investigate different manners of combining the outputs of the classifiers: product of the a posteriori probabilities provided by both classifiers given c n 1 = l1 n ; heuristic rule that shift down the word hypotheses in the Top N list for which c n 1 l1 n ; heuristic rule that re-rank word hypotheses only when the difference between the probabilities of the Top n and Top n + 1 word hypotheses is greater than a threshold. A. Verification using Segments (VS) In this verification scheme the output of the isolated character classifier is taken as stated in Section 3. The task of the verifier is to find the character class (A-Z) with the highest a posteriori probability for each segment (s n 1 ). 41

Therefore, the output of the verifier is a character class with its a posteriori probability, that is, {c n 1, P n c 1 }. Figure 2 shows this verification scheme where the interaction between the isolated character recognizer and word recognizer are presented. Given the tuple {P n, L n, S n } provided by the word recognizer, this scheme of verification uses S n to locate at the input image the segments representing the subwords. Another feature extraction module is used to extract features from such segments and a different feature vector is formed. Then, the task of the verifier is to assign Bayesian a posteriori probabilities to such a new feature vector that hopefully represents a character as well as the class to which the segment may belong (A-Z). Having the label and the a posteriori probability of the segments, such a result can be combined with the output of the word classifier by a suitable rule. To combine the outputs of both classifiers, we use some heuristic rules: Rule 1: If the class provided by the isolated character classifier is coincident with the character class provided by the word recognizer, that is, c n 1 = l1 n, accepts the hypothesis provided by the word recognizer; Rule 2: If the class provided by the isolated character classifier is not coincident with the character class provided by the word recognizer, that is, c n 1 l1 n, check the probability estimated by the isolated character classifier. If such a probability is low, that is, Pc n 1 T L c, accepts the hypothesis provided by the word recognizer. If the probability is high, that is, Pc n 1 T H c, shift down the word hypothesis in Top N list. However, the second rule is not sufficient because the character recognizer is susceptible to incur in errors for certain classes that have similar shapes. The most frequent confusions occur between the classes I and J, g and q, and O and D. Therefore, these confusions produced by the character recognizer may cause undesirable changes in the Top N list, shifting up wrong word hypotheses. Therefore, we have added a third rule to be used in conjunction with the second rule: Rule 3: A word hypothesis L n in the Top N list will be shift down only if the probability assigned by the word recognizer P n is greater than the probability of the word hypothesis L n+1 plus a threshold T w. In Rule 2, T L c and T H c are class dependent thresholds which values are determined on the validation dataset. In Rule 3, T w is a threshold which value is also determined on the validation dataset. B. Verification using Segments and Classes (VSC) In this verification scheme the output of the isolated character classifier is taken in a different manner from that stated in Section 3. Instead of using the neural network as a conventional classifier, we use it as a probability Table II WORD RECOGNITION RATES WITH AND WITHOUT VERIFICATION FOR AN 85,100-WORD RECOGNITION TASK. Configuration Recognition Rate (%) Word Recognizer 68.21 Word Recognizer + Verification (V S) 71.47 Word Recognizer + Verification (V SC 12 ) 70.43 Word Recognizer + Verification (V SC 123 ) 72.11 estimator to assign a probability to the character class given by the word hypotheses. For example, if first word hypothesis provided by the word recognizer is Paris, we provide the image corresponding to the first character to the isolated character recognizer, and, at the output, we take the a posteriori probability that it assigns to the class P, in other words, c n 1 = l n 1. After estimating the probabilities for the first characters of all word hypotheses in the Top N list, we simply combine them with the probability provided by the word recognizer to all word hypotheses. Therefore, the composite probability, denoted as ˆP n for the n-th word hypothesis will be given as: ˆP n = P n wp n c 1 (3) Based on the composite probabilities ˆP n, the Top N list is re-ranked. It is expected that the isolated character recognition estimated low probability values when the first character of the word hypotheses are not correct. In this case, such word hypotheses will be shifted down in the Top N list. Figure 3 shows the main modules of the word recognizer and the isolated character recognizer, as well as the interactions between both. V. EXPERIMENTS Experiments have been carried out with unconstrained handwritten words of the proprietary database that is described in Section II. The setup of the threshold parameters (T H c, T L w, T w ) was done using the validation set. The performance of the verification scheme was evaluated on the testing dataset using the same threshold values determined beforehand. Table II shows the results on the testing dataset for the word recognizer alone as well as the word recognizer combined with the proposed verification schemes. In such a Table, V S denotes the verification scheme based on the product of probabilities, V SC 12 denotes the heuristic verification scheme that employs the first and the second rules, and V SC 123 denotes the heuristic verification scheme that employs the first, second and third rules. The recognition rates presented in Table 2 show that improvements in the word recognition rate is achieved by using the verification scheme. The verification scheme that multiplies the probability of the first character by the word probability is somewhat efficient if we consider that we 42

Figure 2. An overview of the verification of unconstrained handwritten words: only the segments are provided to the isolated character recognizer. are neglecting the probabilities of all other characters that form the word hypothesis. Hopefully, such accuracy can be further improved, by taking into account all characters. For the verification schemes that employ heuristic rules, the improvements in the recognition rate when compared with the word recognizer alone is also interesting. The scheme that uses the three rules has achieved the best recognition results. However, the main problem with such heuristic schemes is the necessity to setup several thresholds. If such thresholds are not carefully chosen, the performance of the verification scheme is completely flawed. VI. CONCLUSIONS In this paper we have presented a simple scheme for the verification of unconstrained handwritten words. The verification scheme uses the segmentation points produced by a word recognizer which is based on a segmentationrecognition strategy. The verification scheme is based on an unconstrained isolated character recognizer that operates in two distinct manners: estimating a posteriori probability to segments representing character, given their classes, and assigning character-class and a posteriori probabilities to the segments. Depending on the value of the probability estimated by the character recognizer and the probability of each word, the verification takes place and re-ranks the Top N list, shifting up the words that start with the same character recognized by the isolated character recognizer. This verification scheme is very limited since it attempts to verify only the first character of the words, neglecting the other characters that make up a word and which are also available at the output of the word recognizer. However, the results are very promising and our future work will focus on the verification of all characters that make up the word hypotheses, as well as on more sophisticated strategies to combine the word classifier and the character classifier. REFERENCES [1] M. Y. Chen, A. Kundu, and S. N. Srihari, Variable duration hidden markov model and morphological segmentation for handwritten word recognition, IEEE Transactions on Image Processing, vol. 4, no. 12, pp. 1675 1688, 1995. [2] A. El-Yacoubi, M. Gilloux, R. Sabourin, and C. Y. Suen, Unconstrained handwritten word recognition using hidden markov models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 8, pp. 752 760, 1999. [3] A. L. Koerich, R. Sabourin, and C. Y. Suen, Large vocabulary off line handwriting recognition: A survey, Pattern Analysis and Applications, vol. 6, no. 2, pp. 97 127, 2003. [4], Recognition and verification of unconstrained handwritten words, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1509 1522, 2005. [5] A. Senior, Off line cursive handwriting recognition using recurrent neural networks, Ph.D. dissertation, University of Cambridge, Cambridge, England, September 1994. 43

Figure 3. recognizer. An overview of the verification of unconstrained handwritten words: the segments and their classes are provided to the isolated character [6] G. Zavaliagkos, Y. Zhao, R. Schwartz, and J. Makhoul, A hybrid segmental neural net/hidden markov model system for continuous speech recognition, IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 151 160, 1994. [7] A. L. Koerich and P. R. Kalva, Unconstrained handwritten character recognition using metaclasses of characters, in Proc. International Conference on Image Processing, Genova, Italy, 2005, pp. 542 545. [8] F. Camastra and A. Vinciarelli, Cursive character recognition by learning vector quantization, Pattern Recognition Letters, vol. 22, pp. 625 629, 2001. [9] L. Heutte, T. Paquet, J. V. Moreau, Y. Lecourtier, and C. Olivier, A structural/statistical feature based vector for handwritten character recognition, Pattern Recognition Letters, vol. 19, pp. 629 641, 1998. [10] F. Kimura, N. Kayahara, Y. Miyake, and M. Shridhar, Machine and human recognition of segmented characters from handwritten words, in Proc. 4th International Conference on Document Analysis and Recognition, Ulm, Germany, 1997, pp. 866 869. [11] E. Vellasques, L. E. S. Oliveira, A. S. B. Jr., A. L. Koerich, and R. Sabourin, Filtering segmentation cuts for digit string recognition, Pattern Recognition, vol. 41, no. 10, pp. 3044 3053, 2008. [12] F. Wang, L. Vuurpijl, and L. Schomaker, Support vector machines for the classification of western handwritten capitals, in Proc. 7th International Workshop on Frontiers in Handwriting Recognition, Amsterdam, Netherlands, 2000, pp. 167 176. [13] S. J. Cho, J. Kim, and J. H. Kim, Verification of graphemes using neural networks in an hmm based on line korean handwriting recognition system, in Proc. 7th International Workshop on Frontiers in Handwriting Recognition, Amsterdam, Netherlands, 2000, pp. 219 228. 44