Template-based Recognition of Online Handwriting

Similar documents
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Word Segmentation of Off-line Handwritten Documents

Speech Recognition at ICSI: Broadcast News and beyond

Large vocabulary off-line handwriting recognition: A survey

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

An Online Handwriting Recognition System For Turkish

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Case Study: News Classification Based on Term Frequency

Problems of the Arabic OCR: New Attitudes

The Strong Minimalist Thesis and Bounded Optimality

CS Machine Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Introduction to Simulation

Lecture 10: Reinforcement Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Lecture 1: Machine Learning Basics

Modeling user preferences and norms in context-aware systems

Arabic Orthography vs. Arabic OCR

Python Machine Learning

Calibration of Confidence Measures in Speech Recognition

Bootstrapping Personal Gesture Shortcuts with the Wisdom of the Crowd and Handwriting Recognition

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

University of Groningen. Systemen, planning, netwerken Bosman, Aart

CEFR Overall Illustrative English Proficiency Scales

Radius STEM Readiness TM

Rule Learning With Negation: Issues Regarding Effectiveness

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Reducing Features to Improve Bug Prediction

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning From the Past with Experiment Databases

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Learning Methods in Multilingual Speech Recognition

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Firms and Markets Saturdays Summer I 2014

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Artificial Neural Networks written examination

Lecture 1: Basic Concepts of Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cl] 2 Apr 2017

Human Emotion Recognition From Speech

Applications of memory-based natural language processing

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

How to Judge the Quality of an Objective Classroom Test

On-Line Data Analytics

Statewide Framework Document for:

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Natural Language Processing. George Konidaris

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

AQUA: An Ontology-Driven Question Answering System

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

EXECUTIVE SUMMARY. Online courses for credit recovery in high schools: Effectiveness and promising practices. April 2017

Seminar - Organic Computing

SARDNET: A Self-Organizing Feature Map for Sequences

Assignment 1: Predicting Amazon Review Ratings

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Australian Journal of Basic and Applied Sciences

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The College Board Redesigned SAT Grade 12

Data Modeling and Databases II Entity-Relationship (ER) Model. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Software Maintenance

Evolutive Neural Net Fuzzy Filtering: Basic Description

Ohio s Learning Standards-Clear Learning Targets

Language Acquisition Chart

Modeling function word errors in DNN-HMM based LVCSR systems

Rule Learning with Negation: Issues Regarding Effectiveness

Phonological Processing for Urdu Text to Speech System

Learning Methods for Fuzzy Systems

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Disambiguation of Thai Personal Name from Online News Articles

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

CSC200: Lecture 4. Allan Borodin

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Speech Emotion Recognition Using Support Vector Machine

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

GACE Computer Science Assessment Test at a Glance

Chapter 2 Rule Learning in a Nutshell

Linking Task: Identifying authors and book titles in verbose queries

Oakland Unified School District English/ Language Arts Course Syllabus

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Corrective Feedback and Persistent Learning for Information Extraction

Softprop: Softmax Neural Network Backpropagation Learning

Probability and Statistics Curriculum Pacing Guide

1. Introduction. 2. The OMBI database editor

Constructing Parallel Corpus from Movie Subtitles

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Learning to Schedule Straight-Line Code

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

An Empirical and Computational Test of Linguistic Relativity

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Detecting English-French Cognates Using Orthographic Edit Distance

Transcription:

Template-based Recognition of Online Handwriting PhD Dissertation of Jakob Sternby Lund University, Sweden Opponent: Sargur N. Srihari University at Buffalo, State University of New York May 30, 2008 1

Handwriting Recognition Individual Digits and Characters Well-formed Words Degenerate writing with non-standard letter shapes Ambiguous Word needing linguistic context thread or shread? 2

Outline of Presentation I. Overview and Handwriting Recognition (Chapters 1 and 2) II. Research Contributions (Chapters 3-10) III. Questions for Discussion 3

I. On-line Handwriting Recognition Algorithms/Software have been available for decades Many strategies, especially for isolated characters Commercial perspective Not yet preferred choice in PDAs, e.g., mobile phones, since Error rates depend on training data sets used Types of error made affect user satisfaction 4

New paradigm Conventional approach Train, test and compare against known data set New approach Provide user with tools to adapt User writing styles that generate less conflicts Interactive To decide types of shapes to be used 5

Unconstrained handwriting recognition Not limited to isolated single characters Arbitrary connections between characters Adds considerable complexity due to segmentation problem Common technique Train multilayer networks for partial character recognition Combine results using HMMs 6

II. Research Contributions Explicit modeling to make recognition limitations transparent Template-based approach Transparent database Explicit information on types of samples that can be recognized Other factors Memory consumption Response time Hardware limitations Mobile platforms Non-probabilistic exploration of sequence-based recognition Arabic unconstrained recognition Techniques applicable to other scripts Require another shape definition database 7

Methods and Results Chapter 3: Preprocessing and Segmentation Chapter 4: Additive Template Matching Chapter 5: Connected Char Recognition with Graphs Chapter 6: Delayed Strokes and Stroke Attachment Chapter 7: Application of Linguistic Knowledge Chapter 8: Clustering and Automatic Template Modeling Chapter 9: Experiments Chapter 10: Conclusions and Future Prospects 8

3. The Segmentation Problem Partition input data (strokes) into smaller entities Given input sample X with strokes X 1,..,X n Each stroke is analyzed and divided into smaller parts (segments) according to certain characteristics Task is one of clustering points in each stroke into an unknown number of segments Need a cluster distance function d(c i,c j ) Set of clusters C=c 1,..,c n is the minimum set of subsets of X={p 1,..p X } Such that max d(c i,c j ) < T a threshold Simple Strategy: Vertical Extreme Point Segmentation Vertical Extreme Point Segmentation complemented with heuristics 9

Segmentation & Shape Analysis Structural information can be derived from segmentation rule Can be used for aligning sample points Example of letter n Four principal components (µ + nσu j ) n=1,2 j =1,2,3,4 Original parameterization of samples of n Reparameterization (retains discontinuity) smooth Discontinuous 10

Parameterization of Segments Few points capturing as much shape information as possible Can be seen as problem of maximizing resulting segment length 11

Discrete Segment Shape Space Fix a set of primitive shapes Approximate input segment to closest primitive Primitives work across scripts 12

4. Additive Template Matching Template matching for on-line handwriting Techniques for matching discretely sampled curves Applied to characters resulting from segmentation strategy Search for suitable distance function 13

Frame Deformation Model View segments of a sample as connection points in a frame Structurally significant points Find best approximating sequence of models to frame Minimize bending energy Analogy with coils and springs Quantitate difference between sample and template sequence as bending energy plus deformation of segment parameterizations 14

Feature Space Segmental features and frame features Relative features enable conditional comparison of shape pairs Features are normalized to same value space 15

Additive Template Distance Distance should be independent of the segmentation concatenated template should produce the same distance as matching sequence of constituents Addition Concatenation 16

Distance Function Given Template T and sample X with same number of segments, their distance is Where d(λ jt,λjx) is as below with Λ T and Λ X the segments of T and X respectively 17

5. Connected Character Recognition with Graphs Extends template matching scheme to arbitrary sequence of connected or nonconnected characters Graph technique becomes a powerful tool 18

Template Sequences Matching input against sequences of templates requires: Template connection operations Rules governing these (connectivity properties) 19

Segmentation Graph Template distance calculations to parts of input Swedish word ek Edge value: Best cumulative distance to that edge given edges to start node k ligature e c ligature d l ligature c e 20

Segmentation Graph (2) Best path through graph corresponds to best approximating template sequence Input Sequence Template Sequence 21

Noise Treatment Noise in segmented input can appear in the form of extra segments Connections between templates need to be adjusted with respect to noise 22

The Recognition Graph A string expansion of the segmentation graph Trie structure Beam search limits the number of strings to each segmentation point 23

6. Delayed Stroke Handling All strokes of a given character need not be made at the same time Construct templates for delayed strokes separately from base shapes Diacritics are small strokes, dots and accents 24

Penup Attachment Variations Want to include diacritic placement into the search for best approximating template sequence Introduces comparison of different stroke attachment models of input 25

Delayed Strokes in the RG Apply branch and bound thinking impose lower bounds for unknown parts of recognition based on diacritic needs of matched base shape symbols 26

Stroke Attachment Variations Each hypothesis string will be associated to a set of stroke attachment variations which will be unique first after the first segmentation point of the last stroke has been passed during graph expansion 27

7. Application of Linguistic Knowledge Humans are very good in deciphering misspelled text Tihs is an eaxxplme of sverelly dgrdaded txeet! Needed to decipher cursive writing Many words illegible when out-of-context thread or thuearl? 28

Missing character shapes in Arabic cursive writing Legible Arabic sample Sample with missing letter seen 29

Use of Context Humans use several types of linguistic context Semantic Grammatical Lexical Lexicon is used to exclude non-words Done efficiently during recognition process 30

Static Lexical Knowledge Used to filter list of hypothesis but will still only enable recognition of matching template sequences. Trie format of dictionary and trie format of recognition graph makes dictionary lookup fast. 31

Static Lexical Knowledge (2) Allows for dramatic reduction in number of necessary hypothesis in recognition graph. 32

8. Clustering/Template Modeling Implicit modeling methods Neural networks, SVM define decision boundaries properties of recognition target are never explicitly given widely used for on-line Explicit modeling (template based) also can learn from a training set 33

Segmentation definition database A set of noise free models explicitly defining how a certain character can be written Each model may have segmentational differences depending on the segmentation method used. Three allographs for 2 with different segmentation definitions 34

Forced Recognition In order to automatically extract samples that corresponds to a certain template in the database forced recognition can be performed for labeling 35

Dataset Cleaning Forced recognition can also be used to clean a dataset from noise that should be avoided in training 36

The Variation Graph Parts of different samples can be combined to form new samples With a graph structure common elements can be shared Graph base and variations The 8 possible sequences 37

9. Experiments Data Sets Single Character Recognition UNIPEN/1a digit data set Proprietary Arabic single characters (Zi Documa) Unconstrained Character Recognition User instructed to write Arabic word without instruction 38

Arabic Word Recognition Fast and memory efficient Software based on strategy presented here has been produced Has decent recognition results Top-10 Top-2 Top-1 Recognition accuracy as a function of memory usage. Recognition without 39 dictionary

Arabic Word Recognition (2) Top-10 Top-2 Top-1 Recognition accuracy as a function of response time in (ms). Recognition without dictionary. 40

10. Conclusions and Future Prospects Fast and effective recognition algorithm based on additive template matching Each part of the strategy has been realized with most simple realization Rejuvenate interest in template-based approach 41

Areas of Improvement To increase recognition accuracy and relax neatness of writing For versatility for script independent system and on-line shape sequence segmentation Segmentation strategy Dynamic lexical lookup Optimization of weights in segmental distance function Parameters in preprocessing tuning of weights for noise distance 42

Quotes: Chapters 1-4 Overview Handwriting Recognition Preprocessing Additive Template Matching 43

Quotes: Chapters 5,6,7 Connected Character Recognition with Graphs Delayed Strokes and Stroke Attachment Linguistic Processing 44

Quotes: Chapters 8,9,10 Clustering and Automatic Template Modeling Experiments Conclusions and Future Prospects 45

III. Questions for Discussion 1. Performance (speed) Characterize time complexity of algorithms 2. Performance (accuracy) 1. How does it compare to implicit modeling methods such as neural network or SVM? Since template matching is nearest-neighbor, is it close to Bayes error rate (P < 2 P*)? 2. What is the effect of vocabulary size on recognition? Template matching How effective is user interactivity? 46