improving handwriting recognition accuracies

Similar documents
Word Segmentation of Off-line Handwritten Documents

Large vocabulary off-line handwriting recognition: A survey

Modeling function word errors in DNN-HMM based LVCSR systems

Speech Recognition at ICSI: Broadcast News and beyond

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Modeling function word errors in DNN-HMM based LVCSR systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

An Online Handwriting Recognition System For Turkish

Noisy SMS Machine Translation in Low-Density Languages

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Switchboard Language Model Improvement with Conversational Data from Gigaword

Linking Task: Identifying authors and book titles in verbose queries

Language Model and Grammar Extraction Variation in Machine Translation

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Lecture 1: Machine Learning Basics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Improvements to the Pruning Behavior of DNN Acoustic Models

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Noisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

The stages of event extraction

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

WHEN THERE IS A mismatch between the acoustic

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Problems of the Arabic OCR: New Attitudes

Learning Methods in Multilingual Speech Recognition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Multi-Lingual Text Leveling

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Arabic Orthography vs. Arabic OCR

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

The A2iA Multi-lingual Text Recognition System at the second Maurdor Evaluation

Investigation on Mandarin Broadcast News Speech Recognition

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

arxiv: v1 [cs.cl] 2 Apr 2017

Detecting English-French Cognates Using Orthographic Edit Distance

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Using dialogue context to improve parsing performance in dialogue systems

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Device Independence and Extensibility in Gesture Recognition

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Lecture 1: Basic Concepts of Machine Learning

Lecture 9: Speech Recognition

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

The NICT Translation System for IWSLT 2012

The Strong Minimalist Thesis and Bounded Optimality

Applications of memory-based natural language processing

A study of speaker adaptation for DNN-based speech synthesis

Probabilistic Latent Semantic Analysis

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A heuristic framework for pivot-based bilingual dictionary induction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

A Class-based Language Model Approach to Chinese Named Entity Identification 1

Speech Emotion Recognition Using Support Vector Machine

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Training and evaluation of POS taggers on the French MULTITAG corpus

CSL465/603 - Machine Learning

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

The taming of the data:

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

BYLINE [Heng Ji, Computer Science Department, New York University,

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Disambiguation of Thai Personal Name from Online News Articles

Calibration of Confidence Measures in Speech Recognition

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Abbreviated text input. The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

Finding Translations in Scanned Book Collections

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Online Updating of Word Representations for Part-of-Speech Tagging

Circuit Simulators: A Revolutionary E-Learning Platform

Deep Neural Network Language Models

CS 598 Natural Language Processing

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

Student Assessment Policy: Education and Counselling

Cross-Lingual Text Categorization

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Transcription:

Phrase based direct model for improving handwriting recognition accuracies Damien Jose dsjose@cubs.buffalo.edu

Agenda Importance of improving handwritten word recognition accuracy Phrase based direct model approach to improve accuracy Experiments Results

Typical Documentary Analysis and Recognition i System Line Segmentation Word Segmentation The Handwriting Recognizer (HR) is a crucial component of any document analysis & retrieval system. Handwriting Recognizer Recognized Text Information Retrieval Word Spotting Topic Modeling Document Classification

Motivation Component systems often developed independently by different groups. Internals of one component not accessible to the developers of the next component in the pipeline. These components (e.g. HR) are treated t as black boxes where only their output is observed. Output of these systems is error-prone. Word recognition is definitely a bottleneck. Performance of line separation, word segmentation and word recognition on 20 document images of different writing styles.

Drawbacks of existing approaches in OCR post-processingprocessing Thus improving the performance of the recognizer will enrich the overall user experience. Jones et al. [1] describe a multi-pass OCR post-processing system which carries out individual word corrections, combined edit distance corrections and bigram probability based correction in different passes. Perez-Cortes et al. [2] use a stochastic finite state machine to test hypothesis of words. If the machine accepts the word, then no correction is made, otherwise smallest set of transitions that could not be traversed show the most similar string in the model. Pal et al. [3] describe a method for OCR error correction of Devanagiri script using morphological parsing. Problems with these approaches include Using features that are language g dependent. Application on machine print OCR that are conventionally character-models as opposed to HR systems that follow a word-based multiple choice paradigm. Training the character confusion matrices is not straight forward.

Proposed Approach Analogous to SMT the problem is viewed as a direct phrase-based translation task. HR output can be visualized as a noisy black-box through which the signal (truth) when passed gets corrupted and emerges out as the degraded d d output. t We hope to model the inherent noise of the OCR and try to create an invertible transform to regenerate the truth from corrupt output Input Stage N-1 HR Stage N+1 a Input b Stage N-1 HR Stage N+1 Output (N) Correction Model Corrected Output (O)

Correction Model Given sentence pairs in the source (Foreign/Corrupt) and the target (English/Truth) languages Align words in the source and target sentences (for e.g. using Levenshtein distance) Extract phrase pairs. Combine noise model with a n-gram language model to translate the source language into target language. Given: Target Truth, Source - OCR output P(tgt,src) g, e = arg max [ ω ph log P( src tgt) ω log P( )] where: e - current hypotheses, - extended hypotheses, ê ˆ arg e lm 10 10 tgt w ph - Phrase model weight, w lm - Language model weight, P(tgt) Tri-gram Language model trained on Reuters data P(src tgt) Phrase Model trained on Conference on Computational Natural Language Learning 2003 data

Steps Involved Handwritten words are generated from CONNL English text by concatenating character templates generated by the Blums MAT, followed by character autoscaling, automatic baseline determination, ligature modeling, ligature joining, skeleton thickening i and smoothing [4]. In-house HR used for recognition is a lexicon driven HMM based word model recognizer. Alignments between input and output done using Levenshtein edit distance. Data is split into a training (75%) and test set (25%). Training and testing was done with a closed lexicon. 5% OOV s were present in the test set.

Phrase Model Probability Recognized words pitcher Hidden words pitcher 1.00 pascolo financially pascolo speculates poisonous 0.40 0.20 0.20 0.20 notation protection invitation motivation notation 0.50 0.40 0.05 0.05 experts experts expired 0.88 0.13 updated injunction uprooted infrastructural 0.40 0.20 0.40

Viterbi Decoding Combining the Phrase and Language models To correct the OCR output for a given test sentence, we translate the sentence by decoding using two weighted components - the phrases obtained above and the language model. Formally, the final decoding e for the source f is the one that satisfies the following equation: Where P(e) is the trigram character language model probability and P(e f) is the phrase-based direct model. Weights w ph and w lm were chosen as (w ph +w lm = 1) for both mixture components. Whenever a test word is not found in the training model we utilize the top-10 unigram outputs from the word recognizer for that word image.

Result of Viterbi decoding using the Phrase Model and Language Model: Log probability Recognized Hidden words words om the 0 official officials 7.89912 official 6.79544 sort om said 15.2251 the 24.5661 accord amanda 54.0033 attackers 50.6336 antara 57.4298 had had 86.9505 batter 99.1431 stated 97.6456 Gerg seized 143.344 leaving 152.173 tsang 152.516 freeze 156.673 tra two 239.679 Roberta kalashnikov 389.403 nagatsuka 401.44 dealt assault 639.717 decades 649.419 consistent 650.351 nationwide 650.707 aples wales 1043.42 maybe 1043.54 rifles 1036.11 engineers 1046.72 ad and 1679.26 cash 1691.12 seed 1691.66 disappears disappointed 2728.21 disappeared 2721.85 disappears 2731.83 Decoded Sentence : the official said the attackers had seized two kalashnikov assault rifles and disappeared

Results Raw, with LM and Noise corrected accuracies of the recognizer on the test set before and after the correction. We observe that there is a considerable increase in the accuracy after the Noise correction. Advantages This technique is adaptable to other recognizers and even other scripts where training i data is available. Fast Decoding - The Viterbi sentence decoding matrix shows the correction model options for the observed output from the recognizer with the corresponding probabilities. Models errors in the phrase context Disadvantage Possibly over fitting on synthetic data

References 1. L. Bahl, F. Jelinek and R. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition, IEEE Transactions on PAMI, 5(2):179 190, 1983. 2. H. Blum, A Transformation for Extracting New Descriptors of Shape, Models for the perception of Speech and Visual Form, MIT Press, 1967, pp 362 380, Cambridge, MA. 3. A. Ittycheriah and S. Roukos, A Maximum Entropy Word Aligner for Arabic-English Machine Translation, Proceedings of the Human Language Technology Conference (HLT-NAACL), 2005, Vancouver, Canada. 4. M. Jones, G. Story and B. Ballard, Integrating multiple knowledge sources in a a Bayesian OCR postprocessor, International Conference on Document Analysis and Recognition, 1991, pp 925 933, 933 St. Malo, France. 5. G. Kim, V. Govindaraju and S. Srihari, Architecture for handwriting recognition systems, International Journal of Document Analysis and Recognition, 2(1):37 44, 1999.

Thank You