Artificial neural network fusion: Application to Arabic words recognition

Similar documents
Word Segmentation of Off-line Handwritten Documents

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Large vocabulary off-line handwriting recognition: A survey

Arabic Orthography vs. Arabic OCR

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

INPE São José dos Campos

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Python Machine Learning

An Online Handwriting Recognition System For Turkish

Human Emotion Recognition From Speech

Learning Methods for Fuzzy Systems

Accepted Manuscript. Title: Region Growing Based Segmentation Algorithm for Typewritten, Handwritten Text Recognition

Problems of the Arabic OCR: New Attitudes

Softprop: Softmax Neural Network Backpropagation Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evolutive Neural Net Fuzzy Filtering: Basic Description

Classification Using ANN: A Review

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning With Negation: Issues Regarding Effectiveness

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Emotion Recognition Using Support Vector Machine

Reducing Features to Improve Bug Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

CS Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Modeling function word errors in DNN-HMM based LVCSR systems

CSL465/603 - Machine Learning

Test Effort Estimation Using Neural Network

Learning Methods in Multilingual Speech Recognition

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

Mining Association Rules in Student s Assessment Data

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

SARDNET: A Self-Organizing Feature Map for Sequences

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Novel Approach for the Recognition of a wide Arabic Handwritten Word Lexicon

Artificial Neural Networks

Rule Learning with Negation: Issues Regarding Effectiveness

On the Formation of Phoneme Categories in DNN Acoustic Models

Welcome to. ECML/PKDD 2004 Community meeting

Data Fusion Models in WSNs: Comparison and Analysis

Cross Language Information Retrieval

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.cv] 10 May 2017

A MULTI-AGENT SYSTEM FOR A DISTANCE SUPPORT IN EDUCATIONAL ROBOTICS

Australian Journal of Basic and Applied Sciences

A study of speaker adaptation for DNN-based speech synthesis

Non intrusive multi-biometrics on a mobile device: a comparison of fusion techniques

A Reinforcement Learning Variant for Control Scheduling

Ordered Incremental Training with Genetic Algorithms

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

A Case Study: News Classification Based on Term Frequency

On-Line Data Analytics

Calibration of Confidence Measures in Speech Recognition

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Disambiguation of Thai Personal Name from Online News Articles

Evolution of Symbolisation in Chimpanzees and Neural Nets

Parsing of part-of-speech tagged Assamese Texts

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Axiom 2013 Team Description Paper

Dropout improves Recurrent Neural Networks for Handwriting Recognition

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Computerized Adaptive Psychological Testing A Personalisation Perspective

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Probabilistic principles in unsupervised learning of visual structure: human data and a model

Semi-Supervised Face Detection

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Detecting English-French Cognates Using Orthographic Edit Distance

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Extending Place Value with Whole Numbers to 1,000,000

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Guide to Teaching Computer Science

Lecture 1: Basic Concepts of Machine Learning

Knowledge-Based - Systems

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

AQUA: An Ontology-Driven Question Answering System

GACE Computer Science Assessment Test at a Glance

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Transcription:

Artificial neural network fusion: Application to Arabic words recognition Nadir Farah, Mohamed Tarek Khadir and Mokhtar Sellami Université Badji Mokhtar of Annaba, Département d Informatique, Bp 12, El Hadjar, 23000, Algeria Abstract. The study of multiple classifier systems has become recently an area of intensive research in pattern recognition in order to improve the results of single classifiers. In this work, two types of features combination for handwritten Arabic literal words amount recognition, using neural network classifiers are discussed. Different parallel combination schemes are presented and their results compared with a single classifier benchmark using a complete feature set. Key words: Handwritten Arabic recognition, structural and statistical features, MLP combinations. 1 Introduction Several successful methods have been developed to recognize isolated handwritten characters and numerals. Nowadays the research is carried for handwritten word recognition [7, 9], which present a challenge due to the difficult nature of the unconstrained handwritten words, including the diversity of character patterns, ambiguity of characters, and the overlapping nature of many characters in a single word [1]. Handwriting recognition systems has been studied for decades and many methods have been developed [7]. Some use only the pixel-images as input to a statistical or neural classifier. Others preprocess the data in order to extract structural features that are fed into a classifier. The combination of different types of information has been shown to be promising in many pattern recognition systems [11, 5]. Different type of classifiers, different type of features, different type of combiner, etc may then be considered. In this paper, using two different family features and three neural network classifiers, Arabic words recognition is addressed. Features consists in two families: structural and the statistical. emails:farah;khadir;sellami@lri-annaba.net

The remainder of this paper is organized as follows: section 2 presents the characteristics of Arabic writing. In section 3 a brief overview of the system architecture is given. Section 4, describes the features extraction modules. The three individual classification systems are described in section 5 and their results in section 6. Combination approaches of classifiers are introduced in section 7 with their obtained results. The paper concludes with discussion on the obtained results. 2 Arabic writing characteristics The Arabic language has a very rich vocabulary. More than 300 million people speak the language, and over 1 billion people use it in several religion-related activities. Arabic script is written from left to right. Figure 1: Bank draft lexicon of Arabic literal amounts As opposed to Latin which start from left to right. The Arabic alphabet consists of 28 characters. Ten of them have one dot, three have two dots, and two have three dots. Dots can be above or below. The shape of the character is context sensitive, depending on its location within a word. A letter can have up to four different shapes: isolated, beginning connection from the left, middle connection from the left and right, and end connection from the right. Most of the letters can be connected from both sides; the right and the left. However, there are six letters which can be connected from one side only; the right. This characteristic implies that each word may contain from one unit or more (sub-words). An example of Arabic words may be given in the lexicon, Figure 1, used in a literal amount of Arabic bank check. Some ligatures involve vertical stacking of characters, this characteristic complicates the segmentation problem [7], which is not considered in this work. 3 The global system architecture The recognition system proposed is of modular architecture: feature extraction and word classification. Firstly, a preprocessing module, which binarises, smooth and extract features. These extracted features are transferred toward the MLP classifiers, Figure 2. The shape features are from two sets: statistical

Figure 2: Global system architecture features and structural ones. Each of these feature sets provides different information about the shape of a word. The first classifier receives the structural features, the second one uses structural and statistical features and the third one the statistical features only. The obtained results are then used by the combiner to produce a final decision. 4 Features Extraction Feature extraction have been highly inspired by the human reading process that considers the global high level words shape [7, 9]. For holistic paradigm there is a wide range of methods to word recognition. They can be basically classified in two categories: statistical and structural. Theses features are automatically extracted, using different algorithms: contour extraction [8] and diacritical dots [10]. The statistical feature set is pixel based information; it is expressed in terms of partitioning the word feature space as presented in figure 2. The features are the density of the lit pixels in various word image regions. The features are obtained from the zone-pattern regions showed in Figure 3(a). The structural feature is expressed as a composition of structural units, and a word is recognized by matching its structural representation with that of a reference words. The main concept in structural features extraction is to calculate the number of ascenders, descenders, loops, etc. Base line detection [8] is one of the most important information that permits us to situate diacritical dot position, and the main part of the word. The features extracted correspond to 9 structural ones, Figure 3(b) according to their possible occurrence numbers in the lexicon, Figure 1: 3 for ascenders, 2 for descenders, 2 for one dot above, 2 for two dots above, 2 for three dots above, 1 for one dot below, 2 for two dots below, 3 for loops, 4 for sub words. Finally, 57 statistical and 21 structural features are distinguished, Figure 3.

Recognition Miss- Rejection Reliability rates classification Structural 87.83% 1.04% 11.13 % 99.00 % (2108) (25) (267) Statistical 74.38% 0.79% 24.83 % 99.10 % (1785) (19) (596) Statistical + 89.17% 1.08% 9.75 % 99.00 % Structural (2140) (26) (234) Table 1: Recognition rates for structural, statistical and both family features 5 Classification stage A three-layer Multi Layer Perceptron (MLP), with a sigmoid activation function has been used for all three modules Figure 2, which are trained using backpropagation algorithms [4]. The number of neurons contained in the hidden layer are calculated by a heuristic. The characteristics of each classifier are given separately in next sections. The three classifiers have been trained with 2400 words, they should then have the same view of the presented words, and will suggest the same or different word classes. (a) Statistical features (b) Structural features Figure 3: Feature extraction for Arabic words 6 Classification results In this study, MLP classifiers have been used, the obtained results after the classification stage on a test set are summarized in Table 1. The reliability is defined by: recognition /(100%-rejection) [11]. For recognition needs, 4800 word images were used. This set represents the 48 words of our lexicon, Figure 1, written by 100 different writers. Among these word images, a set 2400 were used to train the classifiers. The rejection criterion is chosen to keep the reliability of at least 99%. The recognition rate using the structural feature set, is superior by 13.40% to the one using the statistical feature set. However, still sensibly lower than the recognition rate of the classifier applied to the complete feature set.

Recognition Misclassification Reject Reliability Borda 91.70% 0.86% 7.44% 99.10% Count Dempster- 94.87% 0.97% 4.16% 99.00% Shafer Product 93.90% 0.96% 5.14% 99.00% Sum 94.93% 0.97% 4.10% 99.00% Average 93.30% 0.96% 5.74% 99.00% Max 92.12% 0.96% 6.92% 99.00% Min 93.20% 0.96% 5.84% 99.00% Naïve Bayes 93.50% 0.96% 5.54% 99.00% Table 2: Recognition rates using different statistical combination schemes Structural features set have stronger discrimination and provides better recognition rates than statistical ones Table 2. 7 Combination Statistical combination methods are built around two MLP classifiers, performing classification separately on structural and statistical feature. Combination is done on the first and the third MLPs, Figure 2 described in Section 3. The MLP using both feature sets, is used as a comparison benchmark. The first six combination schemes: product, average, maximum, minimum, sum [2] and Dempster-Shafer s evidence theory [11] of the corresponding pairs of the classifier outputs are used to make final decision. The naive Bayes scheme uses the confusion matrices of member classifiers to estimate the certainty of the classifier decision [11]. The Borda count combination is a generalization of the majority vote [3]. The combination methods except Borda count, assume a unique interpretation of the confidence values, for instance as a posteriori probabilities. This is not the case, due to the specific characteristics of the individual classifiers and their different training sets. For this reason a normalization given in [6] is used, which permits to have a normalized scale of the output neurons activation. For neural network, each node in the output layer is associated to one class and its output O i, with [zero to one] range, reflects the response of the network to the corresponding class w i. To facilitate the combination the responses are normalized and used as estimates of the a posteriori probability of each class [6]: P (w i x) = O i k O (1) k In our experiments we used different combination schemes, each classifier yields as output the 48 words of our lexicon with their confidence values P (w i x). Combination results on the 2400 test words are shown in Table 2. The normalized outputs of the two MLP were used as output confidences. The best recognition rates (superior to 93.00%) are obtained by six combination

schemes. These results are about 4% better than the recognition rate of the MLP using both feature sets. 8 Conclusion The combination of two different feature types has been presented in this paper, producing excellent results. The main contribution of this paper is the use of different statistical combination schemes for Arabic word recognition. In our paper, our experimental results show that the combination scheme of single classifiers outperforms classifier using both family features. For this particular study and using this particular set of words, we showed that it is more reliable to investigate simpler classifiers and combine them, instead of using a complex one. Therefore, the problem of curse dimensionality [5] may be avoided. Further investigations may rely on features extraction process and other types of combining schemes using larger word images number. References [1] Blumenstein M., Verma, B.K. Neural-based solutions for the segmentation and the recognition of diffucult handwritten words from a benchmark database, In the 5th International Conference on Document Analysis and Recognition, Bengalore, India, pp.281-284, 1999. [2] Duin R.P.W., The combining classifier: To train or Not to train?, Proceedings 16th International Conference on Pattern Recognition, Quebec City, Canada, August 11-15, 2002. [3] Ho T. K., Hull J.J., Srihari S.N., Decision combination in multiple classifier systems, IEEE transactions on pattern analysis and machine intelligence, Vol. 16, No.1, pp.66-75, January 1994. [4] Jain A. K., Mao J., Mohiuddin K., Artificial Neural Networks: A Tutorial, IEEE Computer, Special Issue on Neural computing, March 1996. [5] Jain A. K., Duin R.P.W, Mao J., Statistical pattern recognition: A review, IEEE transactions on pattern analysis and machine intelligence, vol 22, no 1, January 2000 [6] Kittler J., Hatef M., Duin R.P.W., Matas J., On combining classifiers, IEEE transactions on pattern analysis and machine intelligence, vol 20, no 3, March 1998. [7] Madhvanath S., Govindaraju V., The Role of Holistic Paradigms in Handwritten word Recognition, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 23 no.2, February 2001. [8] Pavlidis T., Algorithms for Graphic and Image Processing, Rockville, MD: Computer science press, 1982. [9] Steinherz T., Rivlin E., Intrator N., Off-line cursive script word recognition: A survey, International Journal on Document analysis and Recognition, IJDAR, Vol 2, pp: 90-110, 1999. [10] Ameur A, Romeo-Pakker K, Miled H, Cheriet M., Approche globale pour la reconnaissance de mots manuscrits Arabes. In Proc. 3 ème Colloque National sur lecrit et le Document, pp: 151-156, 1994. [11] Xu L., Krzyzak A., Suen C. Y., Methods of combining multiple classifiers and their applications to handwriting recognition, IEEE Transactions on systems. Man, and cybernetics, vol. 22, No 3, May/June 1992.