An Improved Segmentation of Online English Handwritten Text Using Recurrent Neural Networks

Similar documents
Word Segmentation of Off-line Handwritten Documents

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Reducing Features to Improve Bug Prediction

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

An Online Handwriting Recognition System For Turkish

Speech Emotion Recognition Using Support Vector Machine

Australian Journal of Basic and Applied Sciences

Dropout improves Recurrent Neural Networks for Handwriting Recognition

INPE São José dos Campos

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Modeling function word errors in DNN-HMM based LVCSR systems

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Python Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Mandarin Lexical Tone Recognition: The Gating Paradigm

Modeling function word errors in DNN-HMM based LVCSR systems

Human Emotion Recognition From Speech

Large vocabulary off-line handwriting recognition: A survey

arxiv: v1 [cs.lg] 7 Apr 2015

arxiv: v4 [cs.cl] 28 Mar 2016

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Learning Methods for Fuzzy Systems

Linking Task: Identifying authors and book titles in verbose queries

Rule Learning With Negation: Issues Regarding Effectiveness

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Rule Learning with Negation: Issues Regarding Effectiveness

A Case-Based Approach To Imitation Learning in Robotic Agents

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

arxiv: v1 [cs.lg] 15 Jun 2015

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Semi-Supervised Face Detection

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

On the Combined Behavior of Autonomous Resource Management Agents

Operational Knowledge Management: a way to manage competence

Using dialogue context to improve parsing performance in dialogue systems

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Introduction to Mobile Learning Systems and Usability Factors

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Quantitative Evaluation of an Intuitive Teaching Method for Industrial Robot Using a Force / Moment Direction Sensor

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Knowledge Transfer in Deep Convolutional Neural Nets

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Data Fusion Models in WSNs: Comparison and Analysis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Reinforcement Learning Variant for Control Scheduling

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

CSL465/603 - Machine Learning

arxiv: v1 [cs.cl] 2 Apr 2017

Georgetown University at TREC 2017 Dynamic Domain Track

Axiom 2013 Team Description Paper

Calibration of Confidence Measures in Speech Recognition

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v1 [cs.lg] 3 May 2013

Probabilistic Latent Semantic Analysis

Learning to Schedule Straight-Line Code

GACE Computer Science Assessment Test at a Glance

Disambiguation of Thai Personal Name from Online News Articles

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Artificial Neural Networks written examination

Lecture 1: Basic Concepts of Machine Learning

A student diagnosing and evaluation system for laboratory-based academic exercises

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

ProFusion2 Sensor Data Fusion for Multiple Active Safety Applications

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Using computational modeling in language acquisition research

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Software Maintenance

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Cross Language Information Retrieval

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

A Case Study: News Classification Based on Term Frequency

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Switchboard Language Model Improvement with Conversational Data from Gigaword

Support Vector Machines for Speaker and Language Recognition

A Comparison of Two Text Representations for Sentiment Analysis

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Course Specifications

Transcription:

An Improved Segmentation of Online English Handwritten Text Using Recurrent Neural Networks Cuong Tuan Nguyen and Masaki Nakagawa Deparment of Computer and Information Sciences Tokyo University of Agriculture and Technology ntcuong2103@gmail.com, nakagawa@cc.tuat.ac.jp Abstract Segmentation of online handwritten text recognition is better to employ the dependency on context of strokes written before and after it. This paper shows an application of Bidirectional Long Short-term Memory recurrent neural networks for segmentation of on-line handwritten English text. The networks allow incorporating long-range context from both forward and backward directions to improve the confident of segmentation over uncertainty. We show that applying the method in the semi-incremental recognition of online handwritten English text reduces up to 62% of waiting time, 50% of processing time. Moreover, recognition rate of the system also improves remarkably by 3 points from 71.7%. 1. Introduction Due to the development of pen-based or touch-based devices such as Tablet PCs, smart-phones, electronic whiteboards, digital pens and so on, online handwriting recognition is reviving more attention. Especially, online handwritten text recognition is a practical input method for these devices without keyboard [1, 2]. Researches focus on improving recognition rate, reducing processing time and dictionary size of recognition systems [3, 4], which enable a handwritten text recognition system to run effectively and reliably on small portable devices. Compared to isolated character or word recognition, handwritten text recognition faces the difficulty of word segmentation and character segmentation due to the ambiguity in segmentation. Moreover, in continuous handwriting, characters tend to be written more cursively. To deal with the problem, applying context for segmentation is crucial. The typical approach is using oversegmentation in combination with recognition results and linguistic context [3]. Based on geometric features, all the potential segmentation positions are determined to build up hypothetical segmentation paths. Then, recognition results and linguistic context is combined to evaluate and find the best path. The SVM method, which has been widely applied to numerous classification tasks achieves good performance on segmentation of on-line handwritten text [5]. The segmentation task, however, can be further improved by incorporating context from both the forward and backward directions. An improved version of bidirectional recurrent neural network: Bidirectional Long Short-term Memory BLSTM [12] allows the network to access long-range context. BLSTM shows its effective in many sequence classification tasks. There are two methods for text recognition. The batch recognition method, which recognizes handwritten text after a user has finished writing, can easily employ the full context to achieve high recognition rate [3]. If all the processes for segmentation and recognition are made after whole text is written, however, it suffers from the problem of large waiting time. As more text is written, larger waiting time is incurred. The other is the incremental recognition method [6, 7] which recognizes handwriting during a user is writing. Although it does not incur large waiting time after the user has finished writing, it may degrade recognition rate due to local processing of every stroke (a sequence of finger-tip or pen-tip coordinates from finger/pen-down to finger/pen-up) and increase the total CPU time due to repeated processing after receiving every stroke. The semi-incremental recognition method applying the resumption of segmentation-recognition, avails context for segmentation and recognition. It realizes very small waiting time, low CPU burden and no significant loss in recognition rate in both Japanese text and English text [8, 9]. In this work, we apply BLSTM for improving segmentation and evaluate its effect on the semiincremental English recognition method. 2. Segmentation of online handwritten text To recognize online handwritten text, there are two main streams: the segmentation free method and the dissection method. In this paper, we focus on the dissection method since it is better for Chinese and Japanese text recognition [1, 3] and it could produce better results even for western handwriting recognition for which the segmentation free method has been dominant. 225

2.1. Segmentation-recognition strategy Online handwriting text recognition deals with the problem of recognizing handwritten text including many text lines. For this problem, handwritten text is segmented into text lines, and then each text line is segmented into words. The segmentation can be so-called hard-decision (Yes or No) or soft-decision (allowing multiple possibilities). Segmentation is made based on geometric layout features (e.g. gap between strokes, stroke histogram, interrelationship and so on). Due to instability and ambiguity of these features in practical handwriting, however, it is difficult to make segmentation without using recognition cues and linguistic context. Thus, we employ the softdecision approach. Segmentation-recognition is accomplished in two steps: over-segmentation and path evaluation-search. A text line is over-segmented into primitive segments such that each segment composes a single word or a part of a word. A segment or a sequence of few consecutive segments is assumed as a candidate word pattern, which is recognized by a word recognizer with a list of candidate categories. Multiple ways of segmentation into a candidate word and multiple ways of recognition into a word are represented by a segmentation-recognition candidate lattice [3]. Text recognition is made by the best path search into the lattice considering geometrics and linguistic context as well as word recognition scores. 2.2. Features for segmentation From the local and global features based on the feature set described in [9], we extend it into a set of nine geometrical features as in Table 2. We define the terms in Table 1: Sp Immediate preceding stroke Ss Immediate succeeding stroke Bp Bounding box of S p Bs Bounding box of S s Bp_all Bounding box of all the preceding strokes Bs_all Bounding box of all the succeeding strokes P Pattern of all strokes Psub Sub-pattern of S p and S s Table 1. Terms of features representation. F1 F2 F3 F4 F5 F6 F7 F8 F9 Distance between B s_all and B p_all in x-axis Average stroke length on horizontal of P Average stroke length on horizontal of P sub Overlap length between B p and B s in x-axis Overlap length between B p and B s in y-axis Minimum point distance between S s and S p Angle between the vector from the centroids of B s and B p and x-axis Ratio between B s width and B p width Ratio between B s height and B p height Table 2. Features for English word segmentation. 2.3. Segmentation by a SVM classifier For word over-segmentation of a text line, the work in [9] uses a SVM classifier to classify each off-stroke into two classes: segmentation point (SP) or non-segmentation point (NSP). A SP off-stroke separates two words while a NSP off-stroke indicates the off-stroke is within a word. Off-strokes with low confidence are classified as undecided point (UP). In training the SVM, however, due to unbalance between the numbers of positive and negative training patterns (i.e. the number of SP and that of NSP), we need to adjust the cost of false positives and false negatives [10]. The higher cost of false positives is set, the higher precision of determining SP is achieved. The same logic applies to false negatives and precision of determining NSP. We use a combination of two SVMs: one with high precision for determining SP, the other with high precision for determining NSP. 2.4. Segmentation by a BLSTM classifier One of the key benefits of RNNs is their ability to use previous context. For standard RNN architectures, however, the range of context that can be accessed in practice is limited due to the vanishing gradient problem [11]. Long Short-Term Memory (LSTM [11]) is an RNN architecture designed to address the vanishing gradient problem. A LSTM layer consists of multiple recurrently connected memory blocks. Each block contains a set of internal units, known as cells, whose activation are controlled by three multiplicative gate units. The effect of the gates is to allow the cells to store and access information over long periods of time. For many tasks, it is useful to have access to future as well past context. Bidirectional LSTM (BLSTM) allows this [12] by using two separate hidden layers to present input in forward and backward directions, both of which are connected to the same output layer to provide access to long-range bidirectional context. We use BLSTM to employ the context of strokes written before and after an off-stroke for segmentation of that offstroke. The training of BLSTM does not suffer the problem of different in number of class patterns. Therefore, we use BLSTM with two thresholds to make over-segmentation. For over segmentation, we need to find all potential segmentation points of off-strokes (which could be then determined as segmentation or non-segmentation points), the remaining are non-segmentation points. Thus, we set a threshold TH1 to determine an off-stroke as a potential segmentation point if the score is above than TH1 and as a non-segmentation point if the score is below TH1. Likewise, we set another threshold TH2 to determine an off-stroke as a potential non-segmentation point or a segmentation point. The off-strokes whose score fall between TH1 and TH2 are classified as UP. Fig. 1 illustrates this method. 226

The segmentation process is divided into two steps. Firstly, we apply segmentation using the SVM or BLSTM classifier from Seg_rp. Secondly, we fix the UP off-strokes as SP before N_seg_fix latest recognized words if they match with the word segmentation points retrieved from the text recognition result. Both N_seg and N_seg_fix are determined experimentally. Figure 1. Over-segmentation using BLSTM. 3. Semi-incremental recognition method 3.1. Processing flow The semi-incremental method, as similarly to the incremental method also makes recognition in background while a user is writing. The method avails the effect of newly written strokes to recognition of previous strokes by makes resumptions of segmentation, recognition, and best path-search. Therefore, the method determines segmentation resuming point (Seg_rp) to resume the segmentation, determines the processing window termed as scope to update and resume best-path search. Fig. 2 shows the processing flow of the semi-incremental recognition method. 3.3. Determination of scope To determine the scope, we use the result from the segmentation process. The segmentations of the strokes before and after the method has received new strokes are compared with each other. If there is an off-stroke whose classification is changed from the before (we call it classification-changed off-stroke), we consider the strokes before the earliest classification-changed off-strokes are stably classified while the strokes after that are not classified stably. Otherwise, the off-stroke before the newly added strokes is considered as the earliest classificationchanged off-stroke. This earliest classification-changed offstroke may occur within a candidate word block or between two candidate word blocks. We define the scope as the sequence of strokes starting from the first stroke of the candidate word block containing or just preceding the earliest classification-changed off-stroke to the last stroke. 4. Experiments Figure 2. Flow of semi-incremental recognition. First, we receive new strokes. Secondly, we update Seg_rp. Thirdly, we apply segmentation from Seg_rp. Fourthly, we determine the scope. Fifthly, we update the src-lattice for this scope. Finally, we resume the best-path search from the beginning in this scope to get text recognition result. The segmentation and text recognition result of the scope is used for next processing cycle. 3.2. Seg_rp determination and segmentation process From the result of text recognition up to the latest scope at the beginning of each processing cycle, we update Seg_rp to the candidate segmentation point before a fixed number (N_seg) of latest recognized words 4.1. Metrics of segmentation evaluation First, over-segmentation is applied and then segmentation is determined along with word recognition and best-path search. We evaluate over-segmentation as well as segmentation. The over-segmentation process classifies each off-stroke to a SP, NSP, or UP off-stroke. Among them, a UP offstroke could then be further classified as SP or NSP in the text recognition process. The performance of over-segmentation is evaluated by the following measures. Precision is the ratio of correctly classified SP off-strokes over detected SP off-strokes as: Precision 3 Recall is the ratio of correctly classified SP off-strokes and detected UP off-strokes over true SP off-strokes as follows: Recall 4 Inclusion of detected UPs in the dividend is typical for over-segmentation since UP off-strokes keep the possibility that they are classified correctly. 227

F-measure is calculated from precision and recall as follows: 2 F measure 5 Detection rate shows the ratio of detected SP off-strokes over detected SP and detected UP off-strokes as: Detection 6 Although UP off-strokes keep the possibility that they are classified correctly, thus increase recall as said above, they decrease the recognition speed of the system. Therefore, along with F-measure, we also evaluate the performance of over-segmentation using the detection rate. 4.2. Experiment setup We employ the IAM online database (IAM-OnDB) [13] which consists of pen trajectories collected from 221 different writers using an electronic whiteboard. We follow the handwritten text recognition task: IAM-OnDB-t1 in which the database is divided into a training set, two validation sets, and a test set containing 5,364, 1,438, 1,518 and 3,859 written lines, respectively. We use a trigram table extract from the LOB text corpus for language modeling. For segmentation, we train both the SVM classifier and BLSTM classifier on segmented words of IAM-OnDB. The SVM classifiers use the RBF kernel with cost factor of 0.1 for high precision of SP determination and 7.5 for high precision of NSP determination. The BLSTM classifier uses a bi-directional layer of 20 LSTM blocks with one cell in each block. After training the BLSTM classifier, based on the distribution of output scores, we set TH1 = 0.1 and TH2 = 0.9. We compare the performance of two semi-incremental recognition systems: the first system uses the SVM classifiers for segmentation (SVM system) and the second uses BLSTM classifier for segmentation (BLSTM system). 4.3. Over-segmentation The BLSTM system outperforms the SVM system for both recall and detection rates. Recall rate has improved 1.5 point, while detection has significant improved from 48% to 86% as shown in Table 3. 4.4. Recognition rate The recognition rates of the SVM system and the BLSTM system with changing N_seg parameter are shown in Fig. 3 and Fig.4. Result at each N_seg includes the maximum, the minimum and average recognition rate when running with the number of strokes each incremental recognition (Ns) from 1 to 10. The BLSTM system improves recognition rate by about 3 point. With high detection rate, the BLSTM system reduces a large number of UPs as compared with SVM. Therefore, the system reduces the ambiguity in best path search and improves recognition rate. Figure 3. Recognition rate of the SVM system. Figure 4. Recognition rate of the LSTM system. 4.5. Waiting time We measure the average waiting time of the two systems with changing Ns. The BLSTM system has reduction rate of average waiting time from 36.65% to 62.43% over the SVM system as shown in Fig. 5. Measures SVM LSTM Recall 96.91 98.57 Precision 99.25 99.06 F-measure 98.07 98.81 Detection 48.34 86.55 Table 3: Over-segmentation results 228

50% of CPU time as compared with the SVM system. Moreover, reducing undecided points also reduces the number of search paths, and lowers ambiguity of recognition so that BLSTM improves recognition rate of the system by 3 points from 71.7%. Acknowledgement This work is being supported by the Grant-in-Aid for Scientific Research (B)-224300095. Figure 5. Average waiting time of the two systems. 4.6. CPU time We also compare both of the systems in CPU time per stroke. Fig. 6 shows the results of the BLSTM and SVM systems with changing of Ns. The BLSTM system also reduces about 50% of CPU time as compared with the SVM system. Figure 6. CPU time of the two systems. 5. Discussion Detection rate gives the ratio of segmentation points over all potential segmentation points. For the two systems, as the same recall rate, higher detection rate reduces the number of undecided points. Since each undecided point doubles the number of candidate word patterns which need to be recognized, high detection rate reduces processing time and waiting time. Each undecided point also doubles the number of search paths. Therefore, higher detection rate reduces the number of search paths, lowers ambiguity, and improves recognition rate. 6. Conclusion In this paper, we proposed a system using BLSTM recurrent neural network for segmentation of on-line handwritten English text. By large improvement in the detection rate of over-segmentation, BLSTM reduces the number of undecided points each of which doubles the number of candidate character patterns. The reduction of candidate character patterns is vital since character recognition is applied for each candidate. The BLSTM system reduces up to 62.34% of waiting time and around References [1] C. L. Liu, S. Jaeger, and M. Nakagawa, "Online Recognition of Chinese Characters: The State-of-the-Art," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 2, pp. 198-213, February 2004. [2] R. Plamondon and S. N. Srihari, "On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63-85, January 2000. [3] B. Zhu, X.D. Zhou, C.L. Liu, and M. Nakagawa, "A robust model for on-line handwritten Japanese text recognition," International Journal on Document Analysis and Recognition, vol. 13, no. 2, pp. 121-131, June 2010. [4] B. Zhu and M. Nakagawa: "Building a compact online MRF recognizer for large character set by structured dictionary representation and vector quantization technique," Pattern Recognition 47(3): 982-993 (2014) [5] B. Zhu and M. Nakagawa, "Segmentation of On-Line Freely Written Japanese Text Using SVM for Improving Text Recognition," IEICE Transactions on Information and Systems, Volume E91.D, Issue 1, pp. 105-113 (2010). [6] H. Tanaka, "Implementation of real-time box-free online Japanese handwriting recognition system," Japanese Patent 3925247, issued March 13, 2002 (in Japanese). [7] D.H. Wang, C.L. Liu, and X.D. Zhou, "An approach for realtime recognition of online Chinese handwritten sentences," Pattern Recognition, no. 45, pp. 3661-3675, 2012. [8] C.T. Nguyen, B. Zhu and M. Nakagawa, "A semiincremental recognition method for on-line handwritten Japanese text," Proc. 12 th Int. Conf. on Document Analysis and Recognition, Washington D.C, USA, Aug. 2014. [9] C.T. Nguyen, B. Zhu and M. Nakagawa, "A semiincremental recognition method for on-line handwritten English text," Proc. 14 h Int. Conf. on Frontier in Handwritten Recognition, Crete, Greece, Sept. 2014. [10] K. Morik, P. Brockhausen, and T. Joachims, "Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring," Proc. 16th Int'l Conf. on Machine Learning (ICML-99), 1999. [11] S. Hochreiter, J. Schmidhuber, "Long Short-term Memory," Neural Computation 9(8):1735-1780, 1997. [12] A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, 18(5-6):602 610, July 2005. [13] M. Liwicki and H. Bunke, "IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard," Proc. 8th Int. Conf. Document Anal. and Recognit, pp 956-961, 2005. 229