An Ocr System For Printed Nasta liq Script: A Segmentation Based Approach

Similar documents
OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Word Segmentation of Off-line Handwritten Documents

Arabic Orthography vs. Arabic OCR

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Problems of the Arabic OCR: New Attitudes

Rule Learning With Negation: Issues Regarding Effectiveness

Accepted Manuscript. Title: Region Growing Based Segmentation Algorithm for Typewritten, Handwritten Text Recognition

Large vocabulary off-line handwriting recognition: A survey

Speech Emotion Recognition Using Support Vector Machine

An Online Handwriting Recognition System For Turkish

Modeling function word errors in DNN-HMM based LVCSR systems

Rule Learning with Negation: Issues Regarding Effectiveness

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Data Fusion Models in WSNs: Comparison and Analysis

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Modeling function word errors in DNN-HMM based LVCSR systems

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Circuit Simulators: A Revolutionary E-Learning Platform

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

CS Machine Learning

Learning Methods in Multilingual Speech Recognition

A Case Study: News Classification Based on Term Frequency

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Python Machine Learning

Mining Association Rules in Student s Assessment Data

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Knowledge Transfer in Deep Convolutional Neural Nets

A study of speaker adaptation for DNN-based speech synthesis

Human Emotion Recognition From Speech

Seminar - Organic Computing

Standards for Members of the American Handwriting Analysis Foundation

SARDNET: A Self-Organizing Feature Map for Sequences

GACE Computer Science Assessment Test at a Glance

Lecture 1: Basic Concepts of Machine Learning

INPE São José dos Campos

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Probabilistic Latent Semantic Analysis

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Reducing Features to Improve Bug Prediction

Multimedia Application Effective Support of Education

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Timeline. Recommendations

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Operational Knowledge Management: a way to manage competence

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Curriculum Vitae FARES FRAIJ, Ph.D. Lecturer

Evolutive Neural Net Fuzzy Filtering: Basic Description

AQUA: An Ontology-Driven Question Answering System

Automatic Pronunciation Checker

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

On-Line Data Analytics

Linking Task: Identifying authors and book titles in verbose queries

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

Dropout improves Recurrent Neural Networks for Handwriting Recognition

A virtual surveying fieldcourse for traversing

Speech Recognition at ICSI: Broadcast News and beyond

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

16 WEEKS STUDY PLAN FOR BS(IT)2 nd Semester

Learning Methods for Fuzzy Systems

Learning From the Past with Experiment Databases

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Department of Computer Science GCU Prospectus

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

DOES OUR EDUCATIONAL SYSTEM ENHANCE CREATIVITY AND INNOVATION AMONG GIFTED STUDENTS?

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A Vector Space Approach for Aspect-Based Sentiment Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Axiom 2013 Team Description Paper

Tour. English Discoveries Online

Lecture 1: Machine Learning Basics

Using dialogue context to improve parsing performance in dialogue systems

Abdul Rahman Chik a*, Tg. Ainul Farha Tg. Abdul Rahman b

A student diagnosing and evaluation system for laboratory-based academic exercises

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Physics 270: Experimental Physics

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Lecture 2: Quantifiers and Approximation

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

PROCEEDINGS OF SPIE. Double degree master program: Optical Design

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Mandarin Lexical Tone Recognition: The Gating Paradigm

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

Transcription:

An Ocr System For Printed Nasta liq Script: A Segmentation Based Approach Saeeda Naz, Arif Iqbal Umar, Saad Bin Ahmed,, Syed Hamad Shirazi, M. Imran Razzak,, Imran Siddiqi Department Of Information Technology, Hazara University, Mansehra, Pakistan Higher Education Department, KPK, Pakistan King Saud Bin Abdul Aziz University for Health Sciences, Riyadh, Saudi Arabia Department of Computer Science, Bahria University Islamabad, Pakistan { saeedanaz292, isaadahmed, mirpak, syedhamad, imransidiqqi}@gmail.com, arifiqbalumar@gmailcom Abstract Machine simulation of human reading has been a subject of intensive research for almost four decades. Automatic Urdu character recognition remains a challenging task due to its cursive nature despite the fact that the latest improvements in recognition methods and systems for Latin script are very promising. This work introduces a robust approach based on statistical models that provide solution for recognition of Urdu text Nasta liq style. Contrary to classical approaches which segment text into words, ligatures or characters, we intend to employ an implicit segmentation where text lines are recognized during segmentation. The developed system will be evaluated on standard Urdu text databases and compared with the state-ofthe-art recognition techniques proposed till date. I. INTRODUCTION Most people learn to read and write during their first few years of education. By the time they have grown out of childhood, they have already acquired very good reading and writing skills including the ability to read most of the texts either handwritten or printed, written in different fonts and styles. Even majority of people have no problems in reading light prints or heavy prints; upside down prints; advertisements in fancy font styles, calligraphic text; characters with flowery ornaments and missing parts. On the contrary, despite more than four decades of intensive research, the reading skill of the computer is still way behind that of human. In the recent years, there has been an unending demand for cursive/non-cursive Optical Character Recognition (OCR) systems, not only to facilitate the native speakers to readily use these OCRs for their mobile or tablet requirements, but also for the digitization of a large amount of legacy documents, such as holy books, magazines, newspapers, poetry books, and handwritten documents. Although a computationally intensive field, OCR has witnessed a significant improvement over the years. This is mainly due to the tremendous advances in the computational intelligence algorithms. The objective of character recognition is to imitate the human reading ability, with the human accuracy but with far higher speed. The target performance is at least five characters per second with a 99.9% recognition rate [1]. OCR is the most important component of various applications, such as document automation, verification of cheques, data entry applications, development of reading machines for visually handicapped and a large variety of many other banking and business applications. OCR is an active area of research and its importance is well established par rapport the disciplines of digital image processing, pattern recognition, artificial intelligence, database systems, natural language processing, human-machine interaction, and communications. These applications can perform well, if the characters from text images are classified and recognized accurately. Most of the commercial OCR applications are concerned with the machine printed Latin scripts having well-separated characters. Moreover, the OCR systems for printed Japanese and Chinese languages are also quite mature. The languages such as Arabic, Persian, Urdu, and Pashto are derived from the Arabic script and are read, written and spoken by a considerable proportion of population in the world. There are many font styles of cursive script, Nasta'liq, Kofi, Thuluth, Diwani, Riq'a, and Naskh to name a few. Among the aforementioned font styles, Naskh and Nasta'liq are the most important to mention wherein the former is preferred for Arabic, Persian, and Pashto languages and the latter is adopted for Urdu typesetting. Some commercial OCRs are available for printed Arabic characters but they have many technical problems, especially in the segmentation stage where the results are not enviable. For all practical purposes, the Urdu script is the superset of its Arabic and Persian counterparts. Recognition of printed Arabic text has received considerable research attention whereas surveys on recognition of Urdu text [2-5] reveal that very limited research efforts have been carried out towards the development of an Urdu OCR. This may be due to the complexities involved Nasta liq writing style [6]. Although Urdu and Arabic share many common attributes, the techniques developed for recognition of Arabic text cannot be directly applied to Urdu text due to complexity of writing style Nastaliq as compare to the Naskh writing style for Arabic. Challenges in recognition of Nasta liq Urdu text include diagonality, multiple baselines, high cursiveness and context sensitivity. Most of the studies on recognition of Urdu text use ligatures as the basic unit of recognition [31-32, 40-43]. The total number of unique Urdu ligatures is approximately 22,000 [39] and training classifiers to learn to discriminate such a large number of classes is a challenging task. Many studies use only a small subset of ligatures representing the frequently used ligatures. Among one of the very well-known ligature based approaches, the study presented by Javed and Hussain [10] on Offline Urdu OCR is evaluated on 1500 ligatures from a set of 5,000 frequently occurring ligatures comprising one to eight characters. HMM HTK toolkit is trained on these ligatures using DCT features ISBN: 978-1-4799-5754-5/14/$26.00 2014 IEEE 255

and a recognition rate of 93% is realized. The number of recognized ligatures (approximately 1500) is very less as compared to the total number of unique Urdu ligatures (approximately 22,000). Moreover, ligature based approaches are limited in the sense that new ligatures on which system is not trained cannot be recognized. To overcome the problem of a large number of classes in ligature based approaches, one of the solutions is to segment the ligatures in to characters and train classifiers to recognize the characters. This reduces the number of classes from total number of unique ligatures to total number of characters and their different shapes. The segmentation of ligatures into characters, however, itself is a challenging and error prone task [16]. To overcome these issues with ligature based and segmentation based recognition, a new trend is to employ implicit segmentation techniques where the text is recognized during segmentation phase itself. Moreover, the limited work on Urdu OCR reported in the literature has mostly been evaluated on non-standard datasets where the researchers would generate their own datasets for evaluation of the proposed techniques. This makes an objective comparison of different methods a challenging task. The main goal of the proposed research is to develop an implicit segmentation based Optical Character Recognition system for printed Urdu text written in Nasta liq font. The paper is organized as: section II is presented the related work in the field of OCR, its motivation and research problem. In section III, we have discussed the general steps involved in the development of an OCR along with discussion of the notable contributions to recognition of Urdu text followed by a discussion on our intended methodology. This section also present the dataset and the evaluation metric we plan to work with. Finally, we conclude the paper with some remarks and our future plan of study. II. RELATED WORK Character recognition techniques associate a Unicode with the image of a character. Based on the mode of input, OCR is classified as offline and online as illustrated in Fig. 1. [7, 8]. The offline OCR deals with the digitized images of text such as handwritten or machine printed. The digital image of text could be obtained from an optical scanner or a camera. In contrast, in the online OCR, the input text is written directly using a tablet, a PDA, or a stylus. The online character recognition is probably easier than its offline counterpart as more information is available, such as time information, stroke coordinates, and handwriting style of the user. A typical OCR system mainly comprises a combination of the following modules. Image acquisition Preprocessing Segmentation o o Feature extraction Segmentation free/holistic approach Segmentation based/ analytical approach Explicit Segmentation Implicit Segmentation o Structural Features o Statistical Features Recognition/classification Post-processing Fig. 1. Types of OCR The images of printed or handwritten documents are acquired using a scanner, camera or a digitizing tablet and are pre-processed before they could be fed to the subsequent modules. Pre-processing typically involves binarization, skew and slant detection and correction, noise removal and segmentation of text and non-text objects [9-14]. Depending upon the type of approach the segmentation step involves splitting the text into lines, words, ligatures, characters or strokes. This step is more crucial in cursive scripts like Arabic, Urdu, Persian, Pashto, Sindhi, Malay (Jawi), Uigher etc. As discussed earlier, the recognition techniques rely on one of segmentation-based or segmentation-free approaches [15-16]. In segmentation-free or holistic approaches, the system seeks to recognize the ligature or word as a whole without segmenting it further into characters or sub images. Generally, paragraphs in text are split into lines using horizontal projection or heuristics based methods. Text lines are then split into words or sub-words (ligatures) using vertical projections and connected component labeling etc. [17]. In segmentationbased or analytical approaches, ligatures are segmented into characters or strokes explicitly or implicitly. Segmentation-based approaches are further categorized into explicit and implicit segmentation. In explicit segmentation, the words or ligature are divided into characters or strokes [16]. Incorrect segmentation leads to misclassifications and results in reduced recognition rates. Correct segmentation of ligatures is in fact the major challenge in explicit segmentation based approaches [18-20]. In the implicit segmentation, words or ligatures are segmented into smaller units while being recognized without any accurate splitting path. Implicit segmentation is also termed as straight or recognition based segmentation. These methods scan the text images line by line from right to left and segments words into characters during/after recognition using 256

codebook entries or predefined classes or a set of features [20-24]. These approaches have been effective on highly cursive scripts and can also be employed in the development of a multilingual OCR. Segmentation is followed by the feature extraction step and a wide variety of statistical as well as structural features have been investigated in the literature. Structural features are typically computed by finding the extreme points and joining points [25] or considering the number of dots, position of the dots, presence of branches, loops or secondary strokes and the slope between the initial point and the final point [26, 32]. Statistical features, for which rich classifiers are available, are mostly preferred over structural features and a large number of techniques rely on statistical features including shape descriptors, contour based statistics, edge based features and other statistical measurements computed at word, ligature or character levels [26 30]. For recognition, a number of classifiers including hidden Markov models (HMM) [16, 24, 33, 34], artificial neural networks (ANN) [20, 25, 31, 36], support vector machine (SVM), nearest neighbor classifier (NN) or template matching [32 33] and decision tree classifier [37] have been extensively used. In some cases, the classification step is followed by postprocessing [9, 10] to improve the overall recognition accuracy of the system. After having discussed the general steps involved in an OCR system, we present the proposed solution in the next section. III. PROPOSED SOLUTION Our study is aimed at developing a robust optical character recognition system based on implicit segmentation. The main steps involved in our work are likely to include the following. Acquisition of printed Urdu text from UPTI database employed in our study. Extraction and selection of features which provide the best recognition rates for implicit segmentation based Urdu OCR for printed text. Recognition using state-of-the-art classifiers like recurrent neural network, hidden markov model, classifiers based on fuzzy logic or conditional random fields (CRF). A. Overview of Proposed System We intend to work on scanned images of text from UPTI dataset. The pre-processing in our case will comprise the traditional steps of de-noising, skew detection and correction and binarization. The text page will be segmented into lines using horizontal projection profiles complemented with some heuristics. We intend to employ implicit segmentation and use a set of statistical features. Features like projection and profiles, chain codes and zone based statistical measures etc. can be explored. Classification can be carried out using neural networks or hidden Markov models while a language model can also be integrated with the system to improve the overall recognition rates through dictionary validation. An overview of the intended methodology is presented in Fig. 2. The system is planned to be developed in MATLAB/Python on Windows platform. Our some efforts are reported in [44, 45]. Fig. 2. Proposed System B. Dataset Most of the existing Urdu OCR systems have been evaluated on custom developed databases. This makes a quantitative comparison of different methods a difficult task. The Image Understanding and Pattern Recognition Group (IUPR) at Technical University of Kaiserslautern, Germany, generated synthetic data of Urdu Nasta liq text from leading Urdu newspapers of Pakistan and termed it as UPTI dataset. We plan to evaluate our system on his standard dataset. C. Measurement Matric The developed recognition system is planned to be evaluated using graph edit distance. The character level accuracy will computed using: insertions + substituti on + deletions accuracy = 100 1 totallengt hoftestset transcript ion IV. CONCLUSION This paper proposed an OCR system for printed Urdu text in Nasta liq script based on implicit segmentation. A set of statistical features will be extracted and fed to the classifier for recognition. The developed technique will also be evaluated on the standard UPTI database and will be compared with existing state-of-the-art Urdu OCRs. REFERENCES [1] V. Govindan and A. Shivaprasad, Character Recognition-A Review, Pattern Recognition, vol. 23, no. 7, pp. 671-683, 1990. [2] S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar, S.A. Madani and S.U. Khan, The optical character recognition of Urdu-like cursive scripts, Pattern Recognition, vol. 47, no. 3, pp. 1229 1248, 2014. 257

[3] S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar, and H. Akbar, Arabic script based character segmentation: A review, In Computer and Information Technology (WCCIT), World Congress on, pp. 1-6. 2013. [4] S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar, and S.Z. Khan, "Challenges in Baseline Detection of Arabic Script Based Languages." Springer International Publishing in Intelligent Systems for Science and Information, pp. 181-196. 2014. [5] S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar, and H. Akbar, Challenges in baseline detection of cursive script languages, In Science and Information Conference (SAI), pp. 551-556, 2013. [6] S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar, and H. Akbar, Arabic script based language character recognition: Nasta'liq vs Naskh analysis, In Computer and Information Technology (WCCIT), World Congress on (pp. 1-7). 2013. [7] B. Al-Badr and S. A. Mahmoud, Survey and Bibliography of Arabic Optical Text Recognition, Signal Processing, vol. 41, no. 1, pp. 49-77, 1995. [8] L.M. Lorigo and V. Govindaraju, Online Arabic Handwriting Recognition: A Survey, IEEE Trans. Pattern Analysis and Machine Intelligence, pp.8, no. 5, pp. 712-724, 2006. [9] M. Naz Q. U. A. Akram and S. Hussain, Binarization and its Evaluation for Urdu Nastalique Document Images, Center for Language Engineering, Al-Khawarizmi Institute of Computer Science, Pakistan 2014. [10] F. Shafait, D. Keysers, and T. M. Breuel, Layout analysis of Urdu document images, In Multitopic Conference, pp. 293-298, 2006. [11] R.J. Ramteke and I. K. Pathan. Noise Reduction in Urdu Document Image Spatial and Frequency Domain Approaches. In Proc. 4th International Conference on Signal and Image Processing 2012 (ICSIP'12), volume 222 of Lecture Notes in Electrical Engineering, pp. 443-452. Springer India, 2013. [12] D.S. Le, G. R. Thoma, and H. Wechsler. Automated Page Orientation and Skew Angle Detection for Binary Document Images. Pattern Recognition, vol. 27, no. 10:1325-1344, 1994. [13] R.J. Ramteke, K. P. Imran, and S. C. Mehrotra. Skew Angle Estimation of Urdu Document Images: A Moments based Approach. International Journal of Machine Learning and Computing, vol.1, no. 1, pp. 7-12, 2011. [14] S. F. Rashid, S. S. Bukhari, F. Shafait, and T. M. Breuel. A Discriminative Learning Approach for Orientation Detection of Urdu Document Images. In Proc. 13th International Multitopic IEEE Conference (INMIC'09), pp. 1-5, 2009. [15] S.T. Javed and S. Hussain, Improving Nastalique Specific Pre- Recognition Process for Urdu OCR, In Proc. 13th International Multitopic IEEE Conference (INMIC'09), pp.1-6, 2009. [16] S. T. Javed, S. Hussain, A. Maqbool, S. Asloob, S. Jamil, and H. Moin, Segmentation Free Nastalique Urdu OCR, World Academy of Science, Engineering and Technology, vol 46, pp. 456-461, 2010. [17] B. Al-Badr and S. A. Mahmoud, Survey and Bibliography of Arabic Optical Text Recognition, Signal Processing, vol. 41, no. 1, pp. 49-77, 1995. [18] A.M. Zeki, The Segmentation Problem in Arabic Character Recognition The State of the Art, In Proc. 1st International Conference on Information and Communication Technologies (ICICT'05), pp. 11-26, 2005. [19] Y.M. Alginahi, A survey on Arabic character segmentation, International Journal on Document Analysis and Recognition, pp. 1-22, 2012. [20] Z. Ahmad, J. K. Orakzai, and I. Shamsher, Urdu Compound Character Recognition using Feed Forward Neural Networks, In Proc. 2nd International Conference on Computer Science and Information Technology (ICCSIT'09), pp. 457-462, 2009. [21] A. Ul-Hasan and S. B. Ahmed and F. Rashid and F. Shafait and T. M. Breuel, Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1061-1065, 2013. [22] A. Graves and M. Liwicki and S. Fern and R. Bertolami and H. Bunke, and J. Schmidhuber, A Novel Connectionist System for Unconstrained Handwriting Recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, 2009. [23] M.S. Khorsheed, Recognising handwritten Arabic manuscripts using a single hidden Markov model, Pattern Recognition Letter, vol 24, pp. 2235 2242, 2003. [24] M.S. Khorsheed, Offline recognition of omnifont Arabic text using the HMM ToolKit (HTK), Pattern Recognition Letter, vol 28, pp. 2235 2242, 2007. [25] I. Shamsher, Z. Ahmad, J. K. Orakzai, and A. Adnan, OCR for Printed Urdu Script using Feed Forward Neural Network, Proc. World Academy of Science, Engineering and Technology, vol. 23, pp. 172-175, 2007. [26] R.G. Casey and G. Nagy, Recursive Segmentation and Classification of Composite Character Patterns, In Proc. 6th International Conference on Pattern Recognition, vol. 2, pp. 1023-1026, 1982. [27] F. Hussain and J. Cowell, Extracting Features from Arabic Characters, In Proc. 2nd International Conference on Computer Graphics and Imaging (CGIM'01), pp. 201-206, 2001. [28] J. Cowell and F. Hussain, A Fast Recognition System for Isolated Arabic Characters, In Proc. 6th International Conference on Information Visualisation, pp. 650-654, London, UK, 2002. [29] A. Muaz, Urdu Optical Character Recognition System, Master's thesis, National University of Computer & Emerging Sciences Lahore, Pakistan, 2010. [30] S.A. Hussain, S. Zaman, and M. Ayub, A Self Organizing Map based Urdu Nasakh Character Recognition, In Proc. International Conference on Emerging Technologies (ICET'09), pp. 267-273, 2009. [31] D.B. Megherbi, S. M. Lodhi, and A. J. Boulenouar, Fuzzy-Logic- Model-based Technique with Application to Urdu Character Recognition, Proc. SPIE Applications of Artificial Neural Networks in Image Processing, vol. 3962, pp. 13-24, 2000. [32] Z.A. Shah, Ligature based Optical Character Recognition of Urdu- Nastaleeq Font, In Proc. 6th International Multitopic IEEE Conference (INMIC'02), 2002. [33] S.A. Husain, A Multi-Tier Holistic Approach for Urdu Nastaliq Recognition, In Proc. 6th International Multitopic IEEE Conference (INMIC'02), pp. 528-532, 2002. [34] M. Decerbo, E. MacRostie, and P. Natarajan, The BBN Byblos Pashto OCR system, In Proc. 1st ACM Workshop on Hardcopy Document Processing (HDP '04), pp. 29-32, 2004. [35] R. Safabakhsh and P. Adibi, Nastaaligh Handwritten Word Recognition Using a Continuous-Density Variable-Duration HMM, The Arabian Journal for Science and Engineering, vol. 30, no. 1B, 95-118, 2005. [36] S.N. Nawaz, M. Sarfraz, A. Zidouri, and W. G. Al-Khatib, An Approach to Online Arabic Character Recognition using Neural Networks, In Proc. 10th International Conference on Electronics, Circuits and Systems (ICECS'03), vol 3, pp. 1328-1331, 2003. [37] U. Pal and A. Sarkar, Recognition of Printed Urdu Script, In Proc. Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), pp. 1183-1187, 2003. [38] D.S. Guru, S. K. Ahmed, and K. Irfan, An Attempt towards Recognition of Handwritten Urdu Characters: A Decision Tree Approach, In Proc. National Conference on Computers and Information Technology (NCCIT'01), pp. 75-83, 2001. [39] A.M. Jamil, Noori Nastaliq Revolution in Urdu composing, book, ELITE PUBLISHERS LTD for NOORI NASTALIQ FOUNDATION, 2008. [40] S.A. Sattar, A Technique for the Design and Implementation of an OCR for Printed Nastalique Text, PhD thesis, NED University of Engineering & Technology, Karachi, Pakistan, 2009. [41] U. Iftikhar, Recognition of Urdu Ligatures, Master's thesis, VIBOT Consortium and German Research Center for Arti_cial Intelligence (DFKI), 2011. [42] N. Sabbour, N. and F. Shafait, A segmentation-free approach to arabic and Urdu OCR. InIS&T/SPIE Electronic Imaging, pp. 86580N-86580, International Society for Optics and Photonics, 2013. 258

[43] G.S. Lehal and A. Rana. Recognition of Nastalique Urdu Ligatures. In Proceedings of the 4th International Workshop on Multilingual OCR. ACM, 2013 [44] S.B. Ahmed, S. Naz, Salahuddin, M.I. Razzak, A.A. Khan, A.I. Umar, UCOM Offline Dataset a Urdu Handwritten Dataset Generation, accepted in IAJIT, unpublished. [45] S.B. Ahmed, S. Naz, Salahuddin, M.I. Razzak, A.I. Umar, Handwritten Urdu Character Recognition using Recurrent Neural Networks:, Accepted in Neural Computing and Application, unpublished. 259