Template-based Recognition of Online Handwriting PhD Dissertation of Jakob Sternby Lund University, Sweden Opponent: Sargur N. Srihari University at Buffalo, State University of New York May 30, 2008 1
Handwriting Recognition Individual Digits and Characters Well-formed Words Degenerate writing with non-standard letter shapes Ambiguous Word needing linguistic context thread or shread? 2
Outline of Presentation I. Overview and Handwriting Recognition (Chapters 1 and 2) II. Research Contributions (Chapters 3-10) III. Questions for Discussion 3
I. On-line Handwriting Recognition Algorithms/Software have been available for decades Many strategies, especially for isolated characters Commercial perspective Not yet preferred choice in PDAs, e.g., mobile phones, since Error rates depend on training data sets used Types of error made affect user satisfaction 4
New paradigm Conventional approach Train, test and compare against known data set New approach Provide user with tools to adapt User writing styles that generate less conflicts Interactive To decide types of shapes to be used 5
Unconstrained handwriting recognition Not limited to isolated single characters Arbitrary connections between characters Adds considerable complexity due to segmentation problem Common technique Train multilayer networks for partial character recognition Combine results using HMMs 6
II. Research Contributions Explicit modeling to make recognition limitations transparent Template-based approach Transparent database Explicit information on types of samples that can be recognized Other factors Memory consumption Response time Hardware limitations Mobile platforms Non-probabilistic exploration of sequence-based recognition Arabic unconstrained recognition Techniques applicable to other scripts Require another shape definition database 7
Methods and Results Chapter 3: Preprocessing and Segmentation Chapter 4: Additive Template Matching Chapter 5: Connected Char Recognition with Graphs Chapter 6: Delayed Strokes and Stroke Attachment Chapter 7: Application of Linguistic Knowledge Chapter 8: Clustering and Automatic Template Modeling Chapter 9: Experiments Chapter 10: Conclusions and Future Prospects 8
3. The Segmentation Problem Partition input data (strokes) into smaller entities Given input sample X with strokes X 1,..,X n Each stroke is analyzed and divided into smaller parts (segments) according to certain characteristics Task is one of clustering points in each stroke into an unknown number of segments Need a cluster distance function d(c i,c j ) Set of clusters C=c 1,..,c n is the minimum set of subsets of X={p 1,..p X } Such that max d(c i,c j ) < T a threshold Simple Strategy: Vertical Extreme Point Segmentation Vertical Extreme Point Segmentation complemented with heuristics 9
Segmentation & Shape Analysis Structural information can be derived from segmentation rule Can be used for aligning sample points Example of letter n Four principal components (µ + nσu j ) n=1,2 j =1,2,3,4 Original parameterization of samples of n Reparameterization (retains discontinuity) smooth Discontinuous 10
Parameterization of Segments Few points capturing as much shape information as possible Can be seen as problem of maximizing resulting segment length 11
Discrete Segment Shape Space Fix a set of primitive shapes Approximate input segment to closest primitive Primitives work across scripts 12
4. Additive Template Matching Template matching for on-line handwriting Techniques for matching discretely sampled curves Applied to characters resulting from segmentation strategy Search for suitable distance function 13
Frame Deformation Model View segments of a sample as connection points in a frame Structurally significant points Find best approximating sequence of models to frame Minimize bending energy Analogy with coils and springs Quantitate difference between sample and template sequence as bending energy plus deformation of segment parameterizations 14
Feature Space Segmental features and frame features Relative features enable conditional comparison of shape pairs Features are normalized to same value space 15
Additive Template Distance Distance should be independent of the segmentation concatenated template should produce the same distance as matching sequence of constituents Addition Concatenation 16
Distance Function Given Template T and sample X with same number of segments, their distance is Where d(λ jt,λjx) is as below with Λ T and Λ X the segments of T and X respectively 17
5. Connected Character Recognition with Graphs Extends template matching scheme to arbitrary sequence of connected or nonconnected characters Graph technique becomes a powerful tool 18
Template Sequences Matching input against sequences of templates requires: Template connection operations Rules governing these (connectivity properties) 19
Segmentation Graph Template distance calculations to parts of input Swedish word ek Edge value: Best cumulative distance to that edge given edges to start node k ligature e c ligature d l ligature c e 20
Segmentation Graph (2) Best path through graph corresponds to best approximating template sequence Input Sequence Template Sequence 21
Noise Treatment Noise in segmented input can appear in the form of extra segments Connections between templates need to be adjusted with respect to noise 22
The Recognition Graph A string expansion of the segmentation graph Trie structure Beam search limits the number of strings to each segmentation point 23
6. Delayed Stroke Handling All strokes of a given character need not be made at the same time Construct templates for delayed strokes separately from base shapes Diacritics are small strokes, dots and accents 24
Penup Attachment Variations Want to include diacritic placement into the search for best approximating template sequence Introduces comparison of different stroke attachment models of input 25
Delayed Strokes in the RG Apply branch and bound thinking impose lower bounds for unknown parts of recognition based on diacritic needs of matched base shape symbols 26
Stroke Attachment Variations Each hypothesis string will be associated to a set of stroke attachment variations which will be unique first after the first segmentation point of the last stroke has been passed during graph expansion 27
7. Application of Linguistic Knowledge Humans are very good in deciphering misspelled text Tihs is an eaxxplme of sverelly dgrdaded txeet! Needed to decipher cursive writing Many words illegible when out-of-context thread or thuearl? 28
Missing character shapes in Arabic cursive writing Legible Arabic sample Sample with missing letter seen 29
Use of Context Humans use several types of linguistic context Semantic Grammatical Lexical Lexicon is used to exclude non-words Done efficiently during recognition process 30
Static Lexical Knowledge Used to filter list of hypothesis but will still only enable recognition of matching template sequences. Trie format of dictionary and trie format of recognition graph makes dictionary lookup fast. 31
Static Lexical Knowledge (2) Allows for dramatic reduction in number of necessary hypothesis in recognition graph. 32
8. Clustering/Template Modeling Implicit modeling methods Neural networks, SVM define decision boundaries properties of recognition target are never explicitly given widely used for on-line Explicit modeling (template based) also can learn from a training set 33
Segmentation definition database A set of noise free models explicitly defining how a certain character can be written Each model may have segmentational differences depending on the segmentation method used. Three allographs for 2 with different segmentation definitions 34
Forced Recognition In order to automatically extract samples that corresponds to a certain template in the database forced recognition can be performed for labeling 35
Dataset Cleaning Forced recognition can also be used to clean a dataset from noise that should be avoided in training 36
The Variation Graph Parts of different samples can be combined to form new samples With a graph structure common elements can be shared Graph base and variations The 8 possible sequences 37
9. Experiments Data Sets Single Character Recognition UNIPEN/1a digit data set Proprietary Arabic single characters (Zi Documa) Unconstrained Character Recognition User instructed to write Arabic word without instruction 38
Arabic Word Recognition Fast and memory efficient Software based on strategy presented here has been produced Has decent recognition results Top-10 Top-2 Top-1 Recognition accuracy as a function of memory usage. Recognition without 39 dictionary
Arabic Word Recognition (2) Top-10 Top-2 Top-1 Recognition accuracy as a function of response time in (ms). Recognition without dictionary. 40
10. Conclusions and Future Prospects Fast and effective recognition algorithm based on additive template matching Each part of the strategy has been realized with most simple realization Rejuvenate interest in template-based approach 41
Areas of Improvement To increase recognition accuracy and relax neatness of writing For versatility for script independent system and on-line shape sequence segmentation Segmentation strategy Dynamic lexical lookup Optimization of weights in segmental distance function Parameters in preprocessing tuning of weights for noise distance 42
Quotes: Chapters 1-4 Overview Handwriting Recognition Preprocessing Additive Template Matching 43
Quotes: Chapters 5,6,7 Connected Character Recognition with Graphs Delayed Strokes and Stroke Attachment Linguistic Processing 44
Quotes: Chapters 8,9,10 Clustering and Automatic Template Modeling Experiments Conclusions and Future Prospects 45
III. Questions for Discussion 1. Performance (speed) Characterize time complexity of algorithms 2. Performance (accuracy) 1. How does it compare to implicit modeling methods such as neural network or SVM? Since template matching is nearest-neighbor, is it close to Bayes error rate (P < 2 P*)? 2. What is the effect of vocabulary size on recognition? Template matching How effective is user interactivity? 46