CHAPTER 3 SYNTACTIC PATTERN RECOGNITION TECHNIQUES FOR OBJECT IDENTIFICATION 3.1. Introduction Pattern recognition problems may be logically divided into two major categories, (i) Study of pattern recognition capabilities of human beings and (ii) Development of theory and techniques for the design of devices that perform a pattern recognition task for a specific application. Pattern recognition could be formally defined as categorization of input data into identifiable classes via extraction of significant features or attributes of the data from the background of irrelevant detail. A pattern class is a category determined by some given common attributes. A pattern is the description of any member of a category representing a pattern class. When a set of patterns of different classes are available, it is necessary to categorize these patterns into their respective classes through the use of some automatic device. 34
3.2. Design concepts of Automatic Pattern Recognition The design concepts for automatic pattern recognition are motivated by the ways in which pattern classes are characterized and defined [27]. Three basic design concepts are discussed in the following. 3.2.1. Membership-roster Concept Characterization of a pattern class by a roster of its members suggests automatic pattern recognition by template matching. The set of patterns belonging to the same pattern class is stored in the pattern recognition system. When an unknown pattern is given to the system, it is compared with the stored patterns and the system classifies this input pattern as a member of a pattern class if it matches with one of the stored patterns belonging to that pattern class. The membership-roster approach works well for near perfect noise-free pattern samples. 3.2.2. Common-property Concept Characterization of a pattern by the common properties shared by all of its members suggests common property concept of automatic pattern recognition system. The patterns with common properties or attributes which reflect similarities among these patterns, are stored in the pattern recognition system. When an unknown pattern is observed by 35
the system, its features are extracted and sometimes coded and then are compared with the stored features. The recognition scheme will classify the new pattern as belonging to the pattern class if its features match with any of the stored features of that class. So, the main objective in this approach is to determine common properties from a finite set of sample patterns and to examine a new pattern for a suitable match. 3.2.3. Clustering Concept When the patterns of a class are vectors whose components are real numbers, a pattern class can be characterized by its clustering properties in the pattern space. The pattern recognition system based on this concept could be designed using the relative geometrical arrangement of the target vectors. The unknown patterns can be easily classified, if the target vectors are far apart in their geometrical arrangement. The simple recognition scheme used in such case is minimum-distance classifiers. On the other hand when the clusters overlap more sophisticated techniques are used for partitioning the pattern space. 3.3. Methodologies The basic design of automatic pattern recognition systems described above makes use of three categories of methodologies: (i) 36
heuristic, (ii) mathematical, and (iii) linguistic or syntactic. Sometimes, a combination of these methods is also used in the design of a pattern recognition system. 3.3.1. Heuristic Methods The heuristic approach is based on human intuition and experience, making use of the membership-roster and common-property concepts. A system designed using this approach would usually consist of ad hoc procedures developed for specialized recognition tasks. Heuristic approach is an important branch of pattern recognition system design, but lacks in generalization since each problem requires application of specifically tailored design rules. 3.3.2. Mathematical Methods The approach is based on classification rules, which are formulated and derived in a mathematical framework, making use of the common-property and clustering concepts. The mathematical approach may be subdivided into two categories: deterministic and statistical. The deterministic approach is based on a mathematical framework, which does not employ explicitly the statistical properties of the pattern classes under consideration. The statistical approach is based on mathematical classification rules, which are formulated and derived in a statistical framework. 37
3.3.3. Linguistic (syntactic) Methods Characterization of patterns by sub patterns and their relationships suggests as automatic pattern recognition by the linguistic or syntactic approach, making use of the common-property concept. A pattern can be described by a hierarchical structure of sub patterns analogous to the syntactic structure of formal languages [67]. This permits the use of formal languages for tackling pattern recognition problems. A pattern grammar consists of finite sets of elements called variables, primitives, and productions. The production rules determine the type of grammar to be used for pattern recognition. Among the most studied grammars are regular grammars, context free grammars, and context-sensitive grammars. The selection of pattern primitives, the assembling or the primitives and their relationships into pattern grammars, and analysis and recognition uses the rules of these grammars. This approach is also useful in dealing with patterns, which cannot be conveniently described by numerical measurements. 3.4. Syntactic Pattern Recognition (SPR) Among the various techniques for object recognition, syntactic pattern recognition technique is generally preferred when high-speed 38
recognition is a matter of concern. The idea behind syntactic pattern recognition is the specification of a set of pattern primitives, a set of rules that governs their interconnection and a recognizer whose structure is determined by the set of rules in the grammar. The description of an object is called pattern. When a person perceives a pattern he makes an inductive inference and associates his perception with some concepts or clues which he might have derived from his past experience. Thus the problem of pattern recognition may be recorded as a classification process of discriminating input data not between individual patterns but between pattern classes via search for certain invariant attributes among members of classes. The patterns used in the process of pattern recognition for identification and classification of patterns are either spatial or temporal. Spatial patterns are those which occupy space like characters, fingerprints, weather maps, physical objects and pictures. Temporal patterns are time based like speech waveforms, electrocardiograms, and target signature and timer series. Syntactic pattern recognition is a new approach of pattern recognition which utilizes the concepts of formal language theory. The term syntactic pattern recognition is synonymous to linguistic pattern recognition grammatical pattern recognition and structural pattern recognition. The difference between the mathematical approach and 39
syntactic approach is that the former one explicitly utilizes the structure of pattern in recognition process, whereas syntactic approach deals with patterns on a strictly quantitative basis. 3.5. Formal Language Theory Syntactic pattern recognition follows the theory of formal languages. The origin of formal language theory may be traced in middle 1950 s with the development by Noam Chomsky s mathematical model of a grammar related to his work in natural languages. The concepts helpful to comprehend the formal language theory are defined below: An Alphabet is any finite set of symbols. A word over an alphabet is any string of finite length composed of symbols from the alphabet. For example, valid words of alphabet {0,1} are 0,1,00,01,10,11. A word with no symbols is called empty word and denoted by Λ. A language is any set of words over an alphabet. 40
As every language follows some specific grammar. Similarly the formal language is associated with a grammar which is basically a 4-tuple: G = {V N, V T, P, S} Where, V N is a set of non-terminals (variables); V T is a set of terminals (constants); P is a set of productions or rewriting rules; S is a set start or root symbol. S belongs to the set V N and V N and V T are disjoint sets, whereas V is the union of sets V N and V T. V * denotes the set (free monoid) of words consisting of the empty word Λ whereas V + is a set (free semi group) of sentences of V * - Λ. The language generated by G, is denoted by L(G). It is the set of strings that satisfy two conditions: (i) Each string is composed only of terminals (i.e., each string is a terminal sentence) (ii) Each string can be derived from S by suitable application of production from the set P. The set P of production consists of expressions of the form α β.the symbol indicates replacement of string α by the string β, where α is a string in V + and β is a string in V *. The set of production formulas are a part of a normal algorithm whose concept was introduced 41
by A.A. Markov. The normal algorithm recognizes angles from changes in the direction during contour tracking. This is done by Look Ahead Tracing (LAT) technique. Grammars differ only in their productions. Now, various types of grammar are: Unrestricted Grammar : - It has production formulas of the form α β, where α is a string and β is another string. Context-Sensitive Grammar : - It has production of the form α 1 Aα 2 α 1 βα 2, where α 1 and α 2 are in V *, β is in V + and A is in V N. This grammar allows replacement of the nonterminal A by the string β only when A appears in the context α 1 Aα 2 of string α 1 and α 2. Context free Grammar:- It has production of the form A β, where A is in V N and β is in V +. The name context free arises from the fact that the variable A may be replaced by a string β regardless of the context in which A appears. Regular (or Finite-state) Grammar:- It is one with productions of the form A ab or A a, where A and B are variables in V N and a is a terminal in V T. These grammars are sometimes called type 0, 1, 2 and 3 grammars respectively. The basic concepts underlying syntactic pattern 42
recognition is illustrated by the development of mathematical models of computing machines, called automata. Given an input string, an automaton is capable of recognizing whether the pattern belongs to the language with which the automaton is associated. A finite automaton is defined as the 5-tuple A f = (Q,, δ, q 0, F) Where Qis a finite, nonempty set of states, is a finite input alphabet, δ is a mapping from Q X into the collection of all subsets of Q, q 0 is the starting state, and F is a set of final or accepting states. 3.6. Formulation of the Syntactic Pattern Recognition Problem Suppose that we have two pattern classes W 1 and W 2. Let the patterns of these classes be composed of features from some finite set. We call the features terminals by V T. Here we use the set of constants denoted by V T = {R, DR, D, DL, L, UL, U, UR} and the root symbols would change depending on the object to be recognized. Note that R, D, L and U are elementary symbols and DR, DL, UL and UR are composite symbols of the alphabet V T. Certain primitives are also used instead of terminals in syntactic pattern recognition. The pattern classes W 1 and W 2 are composed of features from some finite set. If there exists a grammar 43
with the property that the language it generates consists of sentences or words (patterns), which belong exclusively to one of the pattern classes say W 1, this grammar can be used for pattern classification so that a pattern belongs to W 1 if it is a word in L(G 1 ). 3.7. Syntactic Pattern Description The object to be recognized in an image in this case, is a twodimensional pattern. The string grammars of this pattern can be obtained by simple juxtaposition of a string, to form new strings. Juxtaposition of two strings means placing the objects together, without losing the identity of the objects. Concatenation can also be done but it involves spatial rearrangement as well as a loss of identity on the part of the individual objects. Juxtaposition of structures takes place only at two points called a head and tail of an arrow defined by these two points. Graph-like patterns can be recognised as two-dimensional patterns which can then be reduced to an equivalent string representation. The other useful technique for describing two-dimensional relationship is based on tree structures. A tree is a finite set T of one or more nodes such that: There is an especially designated node called the root of the tree 44
The remaining nodes (excluding the root) are partitioned into m disjoint sets T 1,T 2 T m, m 0, where each of these sets is in turn a tree. These trees are called sub trees of the root. A node of degree zero is called a leaf, while a node of higher degree is called a branch node. The tree representation of a pattern is called a pattern tree. Figure 3.1 shows the sample tree diagram for the given pattern. Figure 3.1: Tree representation of patterns Syntax-directed grammar is a mechanics for determining whether or not a pattern can be generated by a particular grammar. Once the grammars are known the basic problem is the development of a procedure for determining whether or not a given pattern represents a valid formula, a word or a sentence. As outlined earlier the procedure used in formal language theory to accomplish this is called parsing. Basically we consider two types of parsing techniques: (i) Top Down and (ii) Bottom Up. The top or root of the (inverted) tree is the start symbol S and through repeated application of the productions of the grammar one can attempt to arrive at the given terminal sentence. The 45
bottom up approach on the other hand starts with the given sentence and attempts to arrive at the symbol S by applying the production. In either case if the parsing fails then the given pattern represents an incorrect sentence and is therefore rejected. The parsing process can be further improved by employing the rules of syntax of the grammar [11]. Syntax is defined as the juxtaposition and concatenation of object. A rule of syntax states some permissible (or prohibited) relation between objects. A syntax-directed parser employs the syntax of the grammar in the parsing process. The syntactic pattern description of various types of objects as the top down process and the regeneration of objects from the sentence obtained as the bottom-up approach are briefly described in the next chapter. 46