Deep Learning and Optical Character Recognition

Similar documents
An Ocr System For Printed Nasta liq Script: A Segmentation Based Approach

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Python Machine Learning

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Artificial Neural Networks

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Knowledge Transfer in Deep Convolutional Neural Nets

Word Segmentation of Off-line Handwritten Documents

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Second Exam: Natural Language Parsing with Neural Networks

arxiv: v1 [cs.lg] 7 Apr 2015

Learning Methods for Fuzzy Systems

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Problems of the Arabic OCR: New Attitudes

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Artificial Neural Networks written examination

Lecture 1: Basic Concepts of Machine Learning

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

arxiv: v1 [cs.lg] 15 Jun 2015

Softprop: Softmax Neural Network Backpropagation Learning

Generative models and adversarial training

INPE São José dos Campos

Learning to Schedule Straight-Line Code

Axiom 2013 Team Description Paper

On-Line Data Analytics

arxiv: v4 [cs.cl] 28 Mar 2016

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Evolutive Neural Net Fuzzy Filtering: Basic Description

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

An empirical study of learning speed in backpropagation

Accepted Manuscript. Title: Region Growing Based Segmentation Algorithm for Typewritten, Handwritten Text Recognition

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

CSL465/603 - Machine Learning

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

TD(λ) and Q-Learning Based Ludo Players

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Arabic Orthography vs. Arabic OCR

Test Effort Estimation Using Neural Network

Classification Using ANN: A Review

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 1: Machine Learning Basics

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

arxiv: v1 [cs.cv] 10 May 2017

Deep Neural Network Language Models

Seminar - Organic Computing

Modeling function word errors in DNN-HMM based LVCSR systems

THE enormous growth of unstructured data, including

Speech Emotion Recognition Using Support Vector Machine

arxiv: v1 [cs.lg] 20 Mar 2017

Adaptive learning based on cognitive load using artificial intelligence and electroencephalography

Exploration. CS : Deep Reinforcement Learning Sergey Levine

GACE Computer Science Assessment Test at a Glance

Off-line handwritten Thai name recognition for student identification in an automated assessment system

Modeling function word errors in DNN-HMM based LVCSR systems

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Offline Writer Identification Using Convolutional Neural Network Activation Features

Forget catastrophic forgetting: AI that learns after deployment

arxiv: v2 [cs.ir] 22 Aug 2016

SARDNET: A Self-Organizing Feature Map for Sequences

Human Emotion Recognition From Speech

Model Ensemble for Click Prediction in Bing Search Ads

Early Model of Student's Graduation Prediction Based on Neural Network

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Predicting Early Students with High Risk to Drop Out of University using a Neural Network-Based Approach

arxiv: v1 [cs.cl] 27 Apr 2016

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Speech Recognition at ICSI: Broadcast News and beyond

Curriculum Vitae FARES FRAIJ, Ph.D. Lecturer

Knowledge-Based - Systems

Automating the E-learning Personalization

arxiv: v4 [cs.cv] 13 Aug 2017

Computerized Adaptive Psychological Testing A Personalisation Perspective

Evolution of Symbolisation in Chimpanzees and Neural Nets

Deep Facial Action Unit Recognition from Partially Labeled Data

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Device Independence and Extensibility in Gesture Recognition

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning With Negation: Issues Regarding Effectiveness

Speaker Identification by Comparison of Smart Methods. Abstract

On the Formation of Phoneme Categories in DNN Acoustic Models

Transcription:

FAISAL SHAFAIT Deep Learning and Optical Character Recognition

Artificial Neural Networks (ANNs) Goal: make computers intelligent Idea: Model human brain Synapse Dendrite Artificial Neural Network x 2 x 1... w 1 w 2 w n x n Cell Nucleus h i Axon a h w x i i 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 2

Features Input Layer 255 255 240 255 x i2 x i1... w i1 w i2... x in w in Hidden Layers h i... 255 255 252 255 255 248 247 255 240 232 238 255 255 255 239 255 Output Layer Output a i.... 0.01 0.9...... 0.2 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 3

ANN Applications: Recognize Patterns Image Analysis Detection (e.g., disease) Recognition (e.g., objects) Identification (e.g., persons) Data Mining Classification Change and Deviation Detection Knowledge Discovery Prognosis Ozone prognosis Weather Forecast Stock market prediction Games, 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 4

The Rise and Fall of ANNs ANN widely used in 1990s Suddenly went out of flavour in 2000s Renaissance Deep Learning Popular deep architectures Neocognitron [Fukushima 1980] Recurrent Neural Networks [Hopfield 1982] Convolutional Neural Networks [LeCun 1989] Long Short-Term Memory Networks [Schmidhuber 1997] Deep Belief Networks [Hinton 2006] Self-Taught Learning [Ng 2007] Features Input Layer Input Layer Hidden Input Layer Input Layer Hidden Layer n Output Layer Output 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 5

Deep Learning Benchmarks Highest accuracy on standard benchmarks The MNIST Handwritten Digits Benchmark The NORB Object Recognition Benchmark The CIFAR Image Classification Benchmark Winning Competitions ICDAR 2013 Arabic OCR Competition MICCAI 2013 Grand Challenge on Mitosis Detection IJCNN 2013 Traffic Sign Recognition Contest ICPR 2012 contest on Mitosis Detection in Histological Images ISBI 2012 Brain Image Segmentation Challenge 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 6

Deep Learning with Long Short-Term Memory (LSTM) Networks 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 7

Recurrent Neural Networks (RNNs) Features Input Layer Hidden Layer Output Layer Proposed by Hopfield in 1982 Recurrent connections are added in order to keep information of previous time stamps in the network Novel equation for the activation: b t h h w x Context information is used How to train those networks? i t i w h b t1 h Output 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 8

Features t-k Input Layer t-k Features t-k+1 Hidden Layer t-k Input Layert-k+1 Features t Hidden Layer t-k+1 Training of RNNs Backpropagation Through Time... Input Layer t Hidden Layer t Unfold the network in time k timestamps (parameter) Perform Backpropagation for output at t Repeat this for each 0 t T 1 Output Layer t Output t 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 9

Vanishing Gradient Outputs Hidden Layer Inputs 1 2 3 4 5 6 7 time 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 10

Core Idea: New Memory Cell Instead of Perceptron 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 11

No Vanishing Gradient Outputs Hidden Layer Inputs 1 2 3 4 5 6 7 time 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 12

Forget Gate Layer Input Layer... Input Gate Layer... Hidden Layer... Sigmoid or tanh Multiplication... Output Gate Layer Full connection 12/29/2016 Single connection Output Layer Shafait: Deep Learning and OCR SEECS, NUST 13

Bidirectional RNN Features t-1 Features t Features t+1 Input Layer t-1 Input Layer t Input Layer t+1 Forward Layer t-1 Forward Layer t Forward Layer t+1 Hidden Layer t-1 Hidden Layer t Hidden Layer t+1 Backw. Layer t-1 Backw. Layer t Backw. Layer t+1 Output Layer t-1 Output Layer t Output Layer t+1 Output t-1 Output t Output t+1 Trained with back-propagation through time (forward path through all time stamps for each hidden layer sequentially) 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 14

Optical Character Recognition (OCR) with MD-BLSTM Input: raw pixel data Output machine-readable transcription Constitutional Irritation. Importance of context To Capital O? Lower case o? Digit 0? Mathematical circle symbol? 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 15

Best Student Paper Award NATIONAL UNIVERSITY OF Scanning Neural Network Architecture Sheikh Faisal Rashid, Faisal Shafait, T Breuel. Scanning Neural Network for Text Line Recognition, 10th IAPR Workshop on Document Analysis Systems, DAS 12. Gold Coast, Australia, Mar. 2012. 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 16

Latin OCR with BLSTM OCR System English Fontana OCRopus 2.14 - Tesseract 1.30 0.91 FineReader 0.85 1.23 BLSTM 0.59 0.15 T Breuel, Adnan ul Hasan, M Al-Azawi, and Faisal Shafait. High-Performance OCR for Printed English and Fraktur Using LSTM Networks, 12th Int. Conf. on Document Analysis and Recognition, ICDAR 13. Washington DC, USA, Aug 2013. 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 17

Urdu OCR with BLSTM Cursive script No word spacing Small inter-line gap 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 18

Urdu OCR with MD-BLSTM Adnan ul Hasan, S. Ahmed, Sheikh Faisal Rashid, Faisal Shafait, T Breuel. Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks, 12th Int. Conf. on Document Analysis and Recognition, ICDAR 13. Washington DC, USA, 2013. 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 19

Urdu OCR with MD-BLSTM Saeeda Naz, A. Umar, R. Ahmad, M. I. Razzak, Sheikh Faisal Rashid, Faisal Shafait, "Urdu Nastaliq Text Recognition using Implicit Segmentation based on Multi-Dimensional Long Short Term Memory Neural Networks", SpringerPlus, 2016 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 20

Results of the ICDAR 2013 Arabic OCR Contest Organized in four challenges 1. Font (B) in 12 pt size 2. Font (B) in multiple sizes 3. Font (I) in multiple sizes 4. All fonts in multiple sizes Our system (jointly developed with Siemens) won the TOP place in all four challenges with a significant margin Fouad Slimane, Slim Kanoun, Haikal El Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert: ICDAR2013 Competition on Multi-font and Multisize Digitally Represented Arabic Text. 12th International Conference on Document Analysis and Recognition, ICDAR 2013: 1433-1437 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 21

Conclusion Deep learning architectures simulate human brain During the years they became more powerful Better architectures and algorithms Faster hardware Diverse application areas Training deep architectures needs many CPU cores a lot of patience Effective training remains an art [LeCun 2013] 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 22

Questions / Comments? 12/29/2016 Shafait: Deep Learning and OCR SEECS, NUST 23