Application of Neural Networks on Cursive Text Recognition

Similar documents
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Word Segmentation of Off-line Handwritten Documents

Python Machine Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Artificial Neural Networks written examination

Evolution of Symbolisation in Chimpanzees and Neural Nets

Accepted Manuscript. Title: Region Growing Based Segmentation Algorithm for Typewritten, Handwritten Text Recognition

Large vocabulary off-line handwriting recognition: A survey

Arabic Orthography vs. Arabic OCR

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Learning Methods for Fuzzy Systems

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Evolutive Neural Net Fuzzy Filtering: Basic Description

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Test Effort Estimation Using Neural Network

Classification Using ANN: A Review

Problems of the Arabic OCR: New Attitudes

INPE São José dos Campos

Knowledge Transfer in Deep Convolutional Neural Nets

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Rule Learning With Negation: Issues Regarding Effectiveness

Softprop: Softmax Neural Network Backpropagation Learning

SARDNET: A Self-Organizing Feature Map for Sequences

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Artificial Neural Networks

An empirical study of learning speed in backpropagation

Human Emotion Recognition From Speech

Rule Learning with Negation: Issues Regarding Effectiveness

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Mining Association Rules in Student s Assessment Data

An Online Handwriting Recognition System For Turkish

Lecture 1: Basic Concepts of Machine Learning

A study of speaker adaptation for DNN-based speech synthesis

Reducing Features to Improve Bug Prediction

Knowledge-Based - Systems

Generative models and adversarial training

Classify: by elimination Road signs

GACE Computer Science Assessment Test at a Glance

A Reinforcement Learning Variant for Control Scheduling

Learning to Schedule Straight-Line Code

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Speaker Identification by Comparison of Smart Methods. Abstract

Modeling function word errors in DNN-HMM based LVCSR systems

Issues in the Mining of Heart Failure Datasets

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Lecture 1: Machine Learning Basics

A Case Study: News Classification Based on Term Frequency

On the Formation of Phoneme Categories in DNN Acoustic Models

Axiom 2013 Team Description Paper

CS Machine Learning

WHEN THERE IS A mismatch between the acoustic

An Interactive Intelligent Language Tutor Over The Internet

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Lecture 10: Reinforcement Learning

A Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon

Modeling function word errors in DNN-HMM based LVCSR systems

Australian Journal of Basic and Applied Sciences

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Phys4051: Methods of Experimental Physics I

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

CSL465/603 - Machine Learning

phone hidden time phone

Curriculum Vitae FARES FRAIJ, Ph.D. Lecturer

On-Line Data Analytics

Using focal point learning to improve human machine tacit coordination

arxiv: v1 [cs.cv] 10 May 2017

An Empirical and Computational Test of Linguistic Relativity

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Speech Emotion Recognition Using Support Vector Machine

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Diagnostic Test. Middle School Mathematics

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Deep Neural Network Language Models

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

(Sub)Gradient Descent

A Pipelined Approach for Iterative Software Process Model

Paper 2. Mathematics test. Calculator allowed. First name. Last name. School KEY STAGE TIER

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

AQUA: An Ontology-Driven Question Answering System

Standards for Members of the American Handwriting Analysis Foundation

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Disambiguation of Thai Personal Name from Online News Articles

Reinforcement Learning by Comparing Immediate Reward

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

An Ocr System For Printed Nasta liq Script: A Segmentation Based Approach

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Transcription:

Application of Neural Networks on Cursive Text Recognition Dr. HABIB GORAINE School of Computer Science University of Westminster Watford Road, Northwick Park, Harrow HA1 3TP, London UNITED KINGDOM Abstract: - This paper describes an Arabic text recognition system based on neural networks. Text is input using a scanner and some pre-processing is done on it to separate lines and words. A processing stage is then applied to each image word and thinning, stroke segmentation and feature extraction performed. Following this, strokes are classified into eleven primitive using three-layer neural networks, which is trained on back propagation algorithm. In the recognition stage the primitives, with some features, are classified into characters or part of a character. A secondary stage combines primitives into characters into Arabic characters and solves ambiguities between pairs and triplets of characters. Key-Words:- Pattern recognition, Arabic character recognition, artificial neural networks, back propagation, skeleton, segmentation. 1. Introduction If you ever find yourself wasting valuable time keying pages of typewritten text, computer printout, faxed documents or newspaper articles then you are doing things the hard way. For Latin characters, there are several good commercial recognition packages that can ease the burden of getting text off the paper into your computer. What about Arabic characters? Arabic character recognition presents a real challenge due to the nature of Arabic writing being cursive. In the past decade there has been increasing research towards Arabic character recognition [1,2,3]. This field is important, not only for Arabic speaking countries, but also for Persian and Urdu-speaking Indian, which have similar character sets. In this paper, a text recognition system is presented which is based on previous research [8], in which an Arabic text is scanned; lines and words are separated at high speed. A processing stage is then applied to this image word and thinning, stroke separation and feature extraction performed. Following this, strokes are classified into eleven primitives using three-layer neural networks, which is trained on back propagation algorithm. In the recognition unit the primitives, with some features are classified into characters or part of a character. A secondary classifier combines strokes into characters and solves ambiguities between pairs and triplets of characters. The steps involved in the process are shown in figure 1 and described in the following sections. 1

Data acquisition: Scanning-in a page of text Pre-processing: Text line separation Word separation Processing: Thinning Stroke separation Sampling Stroke representation Stroke classification: Neural network Character classification: Features 1.1 Separating Word After lines have been separated, a method is applied to separate the words of each line. Unlike English cursive script, an Arabic word could be composed of one or more parts separated by a blank space. For this reason, the separation of words within the lines is more complex. The system must distinguish between the gaps in the vertical histogram of each line indicating letter boundaries of the same word and those indicating word boundaries. In order to separate words, each line is scanned from right to left and the width of the gaps is determined. A word is separated if the gap exceeds a certain threshold fixed experimentally. 3. Processing In order to recognise an Arabic text, a processing stage is applied which consists of the steps to prepare an image word for recognition. It is an important part in the recognition system. The processing involves three main steps; thinning, stroke segmentation, sampling and stroke representation. Figure 2, shows the results of the processing stage. Figure 1: stages of the Arabic Recognition system. 2. Preprocessing In order to isolate words from the text, first of all, lines are separated and then each line is separated into words. 2.1 Separating lines The horizontal histogram of the image text is computed. The valleys found indicate the gaps between lines, because printed Arabic text is written horizontally and empty lines separate lines from each other. The maxima represent the base lines and the minima indicate the interline marker [8]. 3.1 Thinning The aim of the thinning process is to reduce the image word into a thin-line. This gives the possibility of creating dynamic information, such as the strokes sequence from static images. Hilditch s method [5] combined with the unification of junctions [9] proved to give very good results and facilitated the stroke segmentation procedure. 3.2 Stroke segmentation The aim of stroke segmentation technique is to create artificial time information from a static image (such as the time sequence of a pen directions), and reduce the number of strokes, which proved to be very efficient in recombining them into characters. The stroke segmentation method consists of breaking down an Arabic word into principal strokes, which are strings of coordinates and 2

secondary strokes that are additions to the principal ones. The segmentation algorithm is given in detail in a previous paper [2]. It consists of three distinct, sequentially applied steps: i. Identifying the start-point of the stroke. ii. Identifying the end-point of the stroke. iii. Tracing the stroke from the start-point to the end-point of the stroke. Figure 2 shows the strokes obtained from the segmentation process of an Arabic word. 3.3 Smoothing and sampling This process serves as a filter that eliminates redundant points and retains the minimum number of points needed to recognise characters. The filter consists of a sampling algorithm which is based on angular segmentation [4]. The algorithm imposes the condition that the direction of the curve between two consecutive sampled points does not exceed a certain threshold angle. Figure 2 shows an example of the sampling process in which only a minimum number of points are kept. 3.4 Stroke representation In order to feed all the pixels to the network, each consecutive sampled point are represented by a segment. Hence a string of segment s direction will represent each stroke. Each stroke is represented by a string of angles. Each angle is computed from two consecutive points and later normalised to act as an input vector to the neural network. 4. Stroke classification The back-propagation neural network architecture with sigmoid transfer function is shown in figure 3.1 and figure 3.2. This model has three layers; an input layer, an output layer, and a layer in between called the hidden layer. Each unit in the hidden layer and the output layer is like a perceptron unit. The units in the input layer serve to distribute the values they receive to the next layer. The learning rule for multiplayer perceptron is called the generalised delta rule, or the backpropagation rule, and was suggested in 1986 by Rumelhart McClelland, and William [10]. The operation of the network is done by showing the net a pattern and calculates its response. Comparison with the desired response enables the weights to be altered so that the network can produce a more accurate output next time. The learning rule provides the method for adjusting the weights in the network. When an input pattern is presented to the untrained network, it will produce any random output. An error function that represents the 3

Input Hidden Output position, and the presence of secondary strokes. Finally, some strokes are combined into characters and ambiguities are resolved between pairs of characters using geometrical measurements on the character and layout context that covers base line information and the location of one character with respect to its neighbours. Figure-3.1 The Multilayer perceptron 1 y 0 x Figure-3.2. Sigmoid transfer function difference between the current output and the desired output is computed. In order to learn successfully we want to make the output of the net converge towards the desired output. This achieved by adjusting the weights on the link between the units. The back-propagation network has separate stages for learning and operation. Once the network has been trained, the learning process is stopped, and connection weights are fixed. Each stroke is presented to the neural network as a string of normalised angles ready to be processed by the neural network. The neural network consists of three layers; nine input neurons, four neurons in the hidden layer and eleven neurons in the output. 5. Recognition Stage Recognition is achieved in two stages. The first stage is classifying each stroke into one of the eleven primitive using three layer backpropagation neural network and the second stage consists of a description of each character in the form of vector features. The features used for character classification are the position of the stroke in the word, the shape of the stroke, which could be one of the eleven primitives, the existence of a loop, the number of dots and their 6. Experimental results In order to make a comparison with the previous recognition system, the same data was used. It consists of a training set of 60 printed Arabic words written in naskhi font, words being written in a horizontal line and not slanted. The test data consisted of another set of 60 words of the same font. These same 60 words were tested in three different sizes (large, medium, small), making in all about 300 characters. The recognition rate was higher compared to the one from previous research [8]. 7. Conclusion In this paper a new method of Arabic character recognition system is developed based on a previous research. A back propagation network is used to classify strokes into one of the eleven primitive combined with some features in order to recognise Arabic characters. The main idea is working, and further research will be based on a cluster of neural network for the whole system. References: [1] Gasser Auda and Hazem Raafat, An Automatic Text Reader Using Neural Networks, IEEE, March 1993, pp.92-95. [2] H. Goraine, M. Usher, and S. Al- Emami, Offline Arabic Character Recognition of Isolated Arabic Words, for the issue of IEEE Computer System analysis, June 1992. [3] H. Al-Mualim and S.Yamaguchi, A Method of Recognition of Arabic Cursive Handwriting, IEEE Trans. On Pattern Analysis and Machine Intelligence Vol.9, No.9, pp.715-722, September-1987. [4] M. Berthod, Experimentations sur l echantillonnage de traces manuscripts en temps reel, Congres AFCET-IRIA, traitement des 4

images et reconnaissance des formes, Gif sur Yvette, Fevrier 1978. [5] Hilditch C.J., Linear skeletons from square cupboards, in Machine Intelligence 4, 1969, pp.403-420. [6] Hussain B. and Kabuka M.R., A Novel Feature Recognition and its application to Character Recognition, IEEE Transactions On Pattern Analysis and Machine Intelligence, Vol. 16, No. 1, January 1994, pp.98-106. [7] S. Knerr, L. Personnaz, and G. Dreyfus, Handwritten Digit Recognition by Neural Networks with Single-Layer Training, in IEEE Transactions On Neural Networks, Vol. 3, No. 6, November 1992, pp.962-968. [8] H.Goraine and M.J.Usher, Printed Arabic Text Recognition, ICEMCO, London, October 1994. [9] S.Al-Emami, Recognition of Handwritten and Typewritten Arabic Characters, PhD thesis, University of Reading, Department of Cybernetics, September 1988. [10] D.E.Rumelhart, G.E.Hinton, and R.J.Williams, Learning internal representations by error propagation, in Parallel Distributed Processing. Cambridge, MA: MIT Press, 1986, vol. 1. 5