International Research Journal of Engineering and Technology (IRJET) e-issn:

Similar documents
Python Machine Learning

Word Segmentation of Off-line Handwritten Documents

Human Emotion Recognition From Speech

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 1: Machine Learning Basics

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Knowledge Transfer in Deep Convolutional Neural Nets

Speech Emotion Recognition Using Support Vector Machine

Artificial Neural Networks written examination

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Neural Network GUI Tested on Text-To-Phoneme Mapping

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

SARDNET: A Self-Organizing Feature Map for Sequences

Australian Journal of Basic and Applied Sciences

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

INPE São José dos Campos

CSL465/603 - Machine Learning

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Lecture 1: Basic Concepts of Machine Learning

Problems of the Arabic OCR: New Attitudes

Indian Institute of Technology, Kanpur

Modeling function word errors in DNN-HMM based LVCSR systems

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Softprop: Softmax Neural Network Backpropagation Learning

CS Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning With Negation: Issues Regarding Effectiveness

Learning Methods in Multilingual Speech Recognition

Speaker Identification by Comparison of Smart Methods. Abstract

Evolutive Neural Net Fuzzy Filtering: Basic Description

Soft Computing based Learning for Cognitive Radio

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Time series prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Generative models and adversarial training

A study of speaker adaptation for DNN-based speech synthesis

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

arxiv: v1 [cs.lg] 15 Jun 2015

Axiom 2013 Team Description Paper

Learning From the Past with Experiment Databases

(Sub)Gradient Descent

Rule Learning with Negation: Issues Regarding Effectiveness

Classification Using ANN: A Review

Learning Methods for Fuzzy Systems

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

A Case Study: News Classification Based on Term Frequency

Reducing Features to Improve Bug Prediction

Accepted Manuscript. Title: Region Growing Based Segmentation Algorithm for Typewritten, Handwritten Text Recognition

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Arabic Orthography vs. Arabic OCR

On the Formation of Phoneme Categories in DNN Acoustic Models

Calibration of Confidence Measures in Speech Recognition

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Probability and Statistics Curriculum Pacing Guide

Math 098 Intermediate Algebra Spring 2018

An Online Handwriting Recognition System For Turkish

Assignment 1: Predicting Amazon Review Ratings

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Statewide Framework Document for:

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

WHEN THERE IS A mismatch between the acoustic

Large vocabulary off-line handwriting recognition: A survey

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Issues in the Mining of Heart Failure Datasets

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

arxiv: v2 [cs.cv] 30 Mar 2017

Applications of data mining algorithms to analysis of medical data

Data Fusion Through Statistical Matching

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Mining Association Rules in Student s Assessment Data

Test Effort Estimation Using Neural Network

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Probabilistic Latent Semantic Analysis

School of Innovative Technologies and Engineering

Mathematics subject curriculum

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Early Model of Student's Graduation Prediction Based on Neural Network

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

Millersville University Degree Works Training User Guide

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Data Fusion Models in WSNs: Comparison and Analysis

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

A Handwritten French Dataset for Word Spotting - CFRAMUZ

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Transcription:

FEATUE EXTACTION METHOD BASED ON COMBINE CLASSIFIE FO MAATHI HANDWITTEN CHAACTE ECOGNITION Dr. atnashil N Khobragade Assistant Professor, P G Dept of Computer Science, SGB Amravati University, Amravati, Maharashtra, India. ---------------------------------------------------------------------------------------------------------------------------- Abstract: In this paper the MLPNN based classifier is used for classification of Marathi handwritten characters. For MLP NN, various transfer functions and learning rules are investigated for different number of hidden layers and processing elements are set. The Scale Conjugate Gradient (SCG) algorithm is used as learning rule. Self generated database of 3700 characters in training set, test set and validation set is used. The Combine Feature Set Classifier (CFSC) with 240 features is proposed. The training performance of 98.61% is observed using 96 processing elements at 186 epochs with SSE of 1 and the testing performance of 98.42% is observed using SCG training on 50 characters. The training performance of 88.22% is obtained using 880 processing elements at 5000 epochs with SSE of 941 and best testing performance of 84% is observed training using SCG on 3700 characters. Key Words: Feature Extraction, Handwritten Devanagari character, Marathi character ecognition, Combine Feature Set Classifier, Zoning Density Features (ZDF), Artificial Neural Network. 1. INTODUCTION Handwritten character recognition is a frontier area of research for the past few decades and there is a large demand for OC on handwritten documents. This paper concentrated on the MLPNN based classifier for classification of Marathi handwritten characters. For MLPNN, various transfer functions and learning rules are investigated for different number of hidden layers and processing elements are set. The database is generated from the varying group of people and handwritten symbols are collected. The database of 50 and 3700 characters in training set, test set and validation set is used. The Line Classifier Zonal Features (LCZF) obtained 85 features, ow Column Features (CF) 120 features, Zoning Density Features (ZDF) 35 features and Combine Feature Set Classifier (CFSC) 240 features are calculated and applied to MLPNN. The neural network is trained and tested on two sets of database, one database is of group of 50 characters and other database is of 3700 characters by varying parameters. 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2942

2. Characteristics of Marathi Script : Marathi is an official language of Maharashtra and it is derived from Devanagari script. It is the 4th most spoken language in India and 15 th most spoken language in the world. Marathi script consists of 16 vowels and 36 consonants making 52 alphabets. Marathi is written from left to right and has no upper and lower case characters. Every character has a horizontal line at the top called as the header line. The header line joins the characters in a word. Vowels are combined with consonants with the help of specific characteristic marks. These marks occur in line, at the top, or at the bottom of a character in a word [13]. Marathi also has a complex system of compound characters in which two or more consonants are combined forming a new special symbol. In India huge volumes of historical documents and books (handwritten or printed in Devanagari script) remain to be digitized for better access, sharing, indexing etc[14]. The objective of this research is to study the handwritten character recognition and explore a multi-feature multi-classifier scheme for handwritten Marathi characters. 3. LITEATUE EVIEW ecognition of the text like human is still a challenging task for machine. Handwritten characters recognition has been a popular research area for many years because of its various application potentials. The detail literature review for the development in optical character recognition of Marathi handwritten character and Devnagari character recognition such as image preprocessing, segmentation, feature extraction, neural network classifiers and their implementation [13] etc. have been discussed here. In [1] M. Hanmandlu et al. presents the modified exponential membership function fitted to the fuzzy sets for recognition of handwritten Hindi characters based on features consisting of normalized distances obtained using the Box approach. Sandhya Arora et al. [2] suggested combine multiple feature extraction techniques for handwritten devnagari character recognition. Satish Kumar[3] suggested three tier strategy to recognize the hand printed characters of Devanagari script using multiple features and multistage classifier. The recognition rate of 94.2% is achieved with this scheme on database consisting of more than 25000 characters belonging to 43 alphabets. Invariant Moments for handwritten devnagari vowels recognition presented by. J. amteke [4], is independent of size, slant, orientation, translation and other variations in handwritten vowels. In [5] O V amana Murthy, M Hanmandlu considered the character image divided into predefined number of zones and a feature is computed from each of these zones based on the pattern (black) pixels contained in that zone. Some of such features are sum squared 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2943

distance, histogram average pixel density. In [6] Shailendra Kumar Shrivastava and Pratibha Chaurasia use the energy features of segment characters for the classification of Devanagari character using SVM. The best result obtained in DATASET1 Linear kernel 96%, Quadratic kernel 100%, BF kernel 97%, and polynomial kernel 100%. Dinesh V. ojatkar et. al. investigates LTB feature based classifier using single hidden layer feed-forward neural network with five fold cross validation applied to handwritten Devanagari consonant characters in [7]. They found best network at fold 5 with 80 neurons at trial 3. Networks analyzed on account of confusion matrix, reveals the greater details for individual classes. Average classification accuracy on training, validation, test and combined dataset is 99.40%, 97.38%, 97.05% and 98.98% respectively on the total dataset size of 8224 samples distributed uniformly within 32 classes of typical Devnagari consonants. In [8] Sushama Shelke, Shaila Apte presents a novel approach for recognition of unconstrained handwritten Marathi characters using multistage feature extraction and classification scheme and achieved 95.40% recognition rate. Zoning based feature extraction is propose by O. V. amana Murthy and M. Hanmandlu [9] in which character image is divided into predefined number of zones and a feature is computed from each of these zones. In [10] optimal classifier for the categorization of handwritten Marathi consonant characters using a single hidden layer feed-forward neural network with five fold cross validation is proposed by D V ojatkar et. al. and obtained overall, classification accuracy on training, validation, test and combined dataset is 99.58%, 97.88%, 97.62% and 99.05% respectively on the total dataset size of 8224 samples distributed uniformly within 32 classes of typical Devnagari consonants. A detail survey of preprocessing, segmentation, feature extraction, classification and matching techniques for optical character recognition of general scripts presented by atnashil N Khobragade et. al. in [11]. In [12] Compound characters are one of the features of Marathi script Mrs. Snehal S. Golait, Dr. L.G. Malik present a short review on feature extraction for Marathi handwritten compound character recognition. 4. THE POPOSED APPOACH AND EXPEIMENTAL ESULT The Combine Feature Set Classifier (CFSC) with 240 features is proposed in this work. The Scale Conjugate Gradient (SCG) algorithm and Gradient Descent & Momentum Adaptive Learning Algorithms () are used for training and learning the network multiple times. Initially 32 hidden neurons are chosen for experimentation. Then these hidden neurons are increased up to 128 in the step of 16 i.e.{32, 48, 64, 80, 96, 112 and 128}. In some of the training the neurons are increased up to 1016. For the classification of 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2944

characters different types of Artificial Neural Network (ANN) models are studied and the optimal neural network model is selected in experimentation. In the experimental setup the different combination of parameters such as varying number of processing elements, training algorithms, feature extraction methods and varying datasets are tested. 4.1 COMBINE FEATUE SET CLASSIFIE (CFSC) Combine Feature Set Classifier (CFSC) feature extraction method is proposed and implemented in this work. Line Classifier Zonal Feature 85, ow Column Feature 120 and the Zoning Density Feature 35 are combined to form 240 features vector called Combine Feature Set Classifier (CFSC). These combine 240 features are used for training of neural network and then used to classify 37 consonants character of Marathi manuscript. The observations from the results obtained from the training and testing of CFSC method are described as follows. The training and testing performance details for CFSC feature extraction method using SCG training on 50 characters is shown in table 1 and graph is shown in figure 1. It is observed that best training performance of 100% is obtained using 112 processing elements at 5000 epochs with SSE of 0.00099 and best testing performance of 99.1% is observed. This is highest training and testing performance observed in all methods for 50 characters using SCG algorithm. Table 1:Training and Testing details for CFSC using SCG training on 50 characters. Feature Extraction Method Processing Elements Best Max SSE Gradient of Training of Testing 32 5000 5000 1.05 1.41 99.16 98 48 5000 5000 0.115 0.59 99.93 98 64 5000 5000 0.0069 0.032 99.99 99 CFSC 240 80 5000 5000 0.0021 0.003 99.99 99 96 5000 5000 19.44 5.9 68.29 40 112 5000 5000 0.00099 0.0053 100 99.1 128 5000 5000 0.00099 0.0028 100 99 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2945

Figure 1: Graph for Training and Testing graph for CFSC method using SCG training algorithm on 50 characters. The training and testing performance details for CFSC feature extraction method using SCG training on 3700 characters is shown in table 2 and graph is shown in figure 2. Table 2:Training and Testing details for CFSC using SCG training on 3700 characters. Feature Extraction Method Processing Elements Best Max SSE Gradient of Training of Testing 32 5000 5000 2432 110 39.53 30 48 5000 5000 2216 204 48.11 46 64 5000 5000 2067 82.71 53.19 53 80 5000 5000 2697 71.19 25.25 20 96 5000 5000 2496 174 36.83 32 112 5000 5000 2133 1218 51.01 48 CFSC 240 128 5000 5000 1483 199 69.77 70 256 1 5000 2960 0.0024 0.002 0 384 5000 5000 310 6.18 94.48 94 512 5000 5000 237 17.48 95.81 95 624 5000 5000 184 1.13 96.75 96 752 5000 5000 175 1.01 96.92 96 880 5000 5000 145 1.07 97.45 97 1016 5000 5000 144 1.13 97.47 98.21 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2946

Figure 2: Graph for Training and Testing graph for CFSC method using SCG training algorithm on 3700 characters. The best training performance of 97.47% is obtained using 1016 processing elements at 5000 epochs with SSE of 144 and best testing performance of 98.21% is observed. This is the highest performance of testing observed in all methods for 3700 characters using SCG algorithm. 4.2 BEST PEFOMANCE METHOD After training and testing of methods using various combinations of parameters the best training performance is shown in table 3, figure 3 and figure 4. Figure 3:Graph for Best Training and Testing observed using SCG training algorithm on 50 and 3700 characters for all methods 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2947

Table 3: Best Training and Testing using SCG and training algorithm on 50 and 3700 characters for all methods Feature Extracti on Method Trainin g Algorit hm Charact ers Process ing Elemen ts Best Epoc hs Max Epoc hs SSE Gradi ent Performa nce of Training Performa nce of Testing SCG 50 80 609 5000 0.000 99 0.022 99.99 99 SCG 3700 128 5000 5000 445 2.1 92.23 94 LCZF 50 96 235 5000 0.98 0.19 99.99 99 3700 80 5000 5000 0.48 0.63 99.98 98 SCG 50 112 2689 5000 0.001 0.004 100 99 SCG 3700 80 7000 7000 1661 333 65.21 68 CF 50 48 10 5000 35.69 7.75 0.2 0 3700 48 1 5000 2960 1.17 9.8 2 SCG 50 96 58 5000 0.000 77 0.000 91 100 99 SCG 3700 384 5000 5000 433 2.72 92.21 92 ZDF 50 96 233 5000 0.000 99 0.000 29 99.99 99 3700 48 1 5000 2960 7.861 0.0014 0 SCG 50 112 5000 5000 0.00 099 0.005 3 100 99.1 SCG 3700 1016 5000 5000 144 1.13 97.47 98.21 CFSC 50 80 122 5000 35.99 7.98 63.02 60 3700 96 1 5000 2960 2.24 3.51 1 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2948

Figure 4: Graph for Best Training and Testing using SCG and training algorithm on 50 and 3700 characters for all methods. Best training performance of 100% and testing performance of 99.1% is observed on 112 PE at 5000 epochs with SSE of 0.00099 for CFSC feature extraction method using SCG training on 50 characters. The best training performance of 97.47% and best testing performance of 98.21% is observed at 1016 PE on 5000 epochs with SSE of 144 for CFSC feature extraction method on 3700 characters using SCG algorithm shown in table 4 and figure 5. Table 4: Best Training and Testing using SCG training algorithm on 50, 3700 characters. Feature Extraction Method Training Algorithm Characters Processing Elements Best Max SSE Gradient of Training of Testing CFSC SCG 50 112 5000 5000 0.00099 0.0053 100 99.1 SCG 3700 1016 5000 5000 144 1.13 97.47 98.21 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2949

Figure 5: Graph for Best Training and Testing using SCG training algorithm on 50 and 3700 characters. 5. CONCLUSION AND FUTUE SCOPE In this work we have evaluated the performance of the MLPNN based classifiers for classification of Marathi handwritten characters using Combine Feature Set Classifier (CFSC) feature extraction method and results are found to be satisfactory. For MLPNN, various learning rules and transfer are investigated for different number of hidden layers and processing elements are set. The Scale Conjugate Gradient algorithm is used as default-learning rule. The training and testing performance of proposed feature extraction method using SCG training algorithms on 50 and 3700 characters are observed. The training performance of 98.61% is observed using 96 processing elements at 186 epochs with SSE of 1 and the testing performance of 98.42% is observed using SCG training on 50 characters. The training performance of 88.22% is obtained using 880 processing elements at 5000 epochs with SSE of 941 and best testing performance of 84% is observed training using SCG on 3700 characters. The features extraction method could be further change and analyze to reduce the dimensionally and computational complexity. The work can be extended for recognition of words and complete sentences for Marathi manuscript and for the other scripts. 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2950

EFEENCES [1] M. Hanmandlu, O.V. amana Murthy Vamsi Krishna Madasu, Fuzzy Model based recognition of handwritten Hindi characters, Digital Image Computing Techniques and Applications, 0-7695- 3067-2/07 $25.00 IEEE 2007. [2] Sandhya Arora et.al., Combining Multiple Feature Extraction Techniques for Handwritten Devnagari Character ecognition, egion 10 Colloquium and the Third ICIIS, Kharagpur, INDIA December 8-10, IEEE 2008. [3] Satish Kumar, Devanagari Hand-printed Character ecognition using Multiple Features and Multistage Classifier, International Journal of Computer Information Systems and Industrial Management Applications IJCISIM ISSN: 2150-7988 Vol.2, pp. 039-055, 2010. [4]. J. amteke, Invariant Moments Based Feature Extraction for Handwritten Devanagari Vowels ecognition, International Journal of Computer Applications 0975-8887 Volume 1 No. 18, 2010. [5] O V amana Murthy, M Hanmandlu, Zoning based Devanagari Character ecognition, International Journal of Computer Applications 0975 8887 Volume 27 No.4, August 2011. [6] Shailendra Kumar Shrivastava, Pratibha Chaurasia, Handwritten Devanagari Lipi using Support Vector Machine, International Journal of Computer Applications (0975 8887) Volume 43 No.20, April 2012. [7] Dinesh V. ojatkar, Krushna D. Chinchkhede, G.G. Sarate, Design and Analysis of LTB feature based Classifier applied to Handwritten Devnagari Characters : A Neural Network Approach, International Conference on Advances in Computing, Communications and Informatics ICACCI pp. 97-101, 2013. [8] Sushama Shelke, Shaila Apte, A Novel Multi-feature Multi-Classifier Scheme for Unconstrained Handwritten Devanagari Character ecognition, 12th International Conference on Frontiers in Handwriting ecognition 2010. [9] O. V. amana Murthy, M. Hanmandlu, Zoning based Devanagari Character ecognition, International Journal of Computer Applications 0975 8887 Volume 27 No.4, pp 21-25, August 2011. [10] Dinesh V. ojatkar, Krushna D. Chinchkhede, G.G. Sarate, Handwritten Devnagari Consonants ecognition using MLPNN with Five Fold Cross Validation International Conference on Circuits Power and Computing Technologies ICCPCT pp. 1222-1226, IEEE 2013. 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2951

[11] atnashil N Khobragade, Dr. Nitin A. Koli, Mahendra S Makesar, A Survey on ecognition of Devnagari Script, International Journal of Computer Applications & Information Technology, Vol. II, Issue I, IJCAIT January 2013. [12] Mrs. Snehal S. Golait, Dr. L.G. Malik, eview on Feature Extraction Technique for Handwritten Marathi Compound Character ecognition, 2013 Sixth International Conference on Emerging Trends in Engineering and Technology 978-1-4799-2560- 5/13, IEEE 2013. [13] atnashil N Khobragade, Dr. Nitin A. Koli, Mahendra S Makesar, Zoning Density Based Feature Extraction For ecognition Of Marathi Handwritten Character, International esearch Journal Of Engineering And Technology (IJET), Volume: 02 Issue: 04, Pp. 1819-1823, July-2015. [14] atnashil N Khobragade, Dr. Nitin A. Koli, Mahendra S Makesar, Analysis of Methods for ecognition of Devnagari Script, International Journal of Pure and Applied esearch in Engineering and Technology, Volume 2 (8) pp. 27-38, IJPET, 01-04-2014. 2017, IJET Impact Factor value: 5.181 ISO 9001:2008 Certified Journal Page 2952