CSCI 1010 Models of Computa3on Lecture 16 The Chomsky Language Hierarchy
Overview Defini3ons of phrase structure, context-free and regular languages Proof showing that languages defined by regular grammars and languages recognized by finite-state machines are the same. Parse trees for CFLs Conver3ng CFGs to Chomsky normal form
Chomsky Hierarchy Four language types, one less expressive than the next, each with its own grammar rules. Regular Context-Free Context-Sensi3ve Phrase Structure
The Chomsky Hierarchy Phrase structure languages are most expressive and recognized by Turing machines. Context-sensi3ve languages are recognized by linear-bounded automata, TM s with amount of space bounded by O( input ). Context-free languages are recognized by pushdown automata. Regular languages are recognized by FSMs.
Phrase Structure Languages Defined by grammars: G = (N, T, R, S) 1. N = non-terminals, T = terminals, V = N T, and start symbol S N 2. Rules R V + V*, R is finite. For (a, b) R, a contains at least one non-terminal. Also, (a, a) R for all a in N +. 3. If (a, b) R, we write a b and say b is derived from a 4. Let u V + and a be a substring of u. Let a b. If v is obtained by replacing a by b in u, we write u G v (immediate deriva4on of v from u.) 5. If u G x 1 G x 2 G... G x n G v, we write u * G v. Here * G is the transi3ve closure of G. 6. The language defined by G is the set of terminal strings derived from S using the rules R, L(G) = {v T* S * G v}
Context-Sensi3ve Languages Context-sensi3ve grammars are phrase structure grammars in which for each rule (a, b) R has a b. Context-sensi3ve languages are generated by context-sensi3ve grammars.
Example G1 = (N 1,T 1 R 1,S) is context-sensi3ve where N 1 = {S,B,C}, T 1 = {a, b, c} and R 1 shown below. Context is important! L(G) contains aabbcc, which follows from rules (a), (b), (c), (d), (e), (f), and (g) that produce a terminal string: S asbc aabcbc aabbcc aabbcc aabbcc aabbcc aabbcc. S a n (BC) n is possible using (a) n-1 4mes and (b). If (c) is not used to produce S a n B n C n, substring cb occurs for which there is no rule. Thus, L(G) = {a n b n c n n 1}.
Context-Free Languages A context-free grammar is a phrase-structure grammar G = (N, T, R, S) in which rules have a single non-terminal on the leh. Context-free languages generate context-free grammars. Example: N 2 = {S}, T 2 = {ε,a,b}, R 2 = {S asb, S ε}. Then, G 2 = (N 2,T 2,R 2,S) is context-free. L(G 2 )= {a n b n n 0}. To see this, apply S asb n 3mes to give S a n Sb n aher which apply S ε.
Context-Free Languages Context-free languages are widely used to parse a large por3on of programming languages. They need to be augmented with seman3c analysis because such languages are not context-free. For example, in the statement name1 = name2; name2 could be either a func3on or variable depending on context. A parse of tree of would be augmented with this type of informa3on.
Regular Languages A regular grammar is a context-free grammar G = (N,T,R,S) in which the right-hand side of each rule is either a terminal or a terminal followed by a non-terminal. That is, they are of the form A bc or A a. Regular languages are generated by regular grammars.
Regular Languages Example: Let N 4 = {S,A B}, T 4 = {0,1}, R 4 below. The rules given above are equivalent to S 0, S 01B, B 01B, B 0. Thus, the original and new grammars both generate the language L(G 4 ) = (01)*0. We now give an FSM that recognizes L(G 4 ).
Recognizing Regular Languages Theorem: The regular languages and those recognized by FSM s are the same. Proof If G is regular, L(G) is recognized by an FSM. Replace each rule A a by the two rules A af and F ε where F is a new non-terminal. Construct a state for each non-terminal. Insert edge from state A to state B with label a for each rule A ab. Make A final if A ε. This FSM accepts w such that S wb where B ε. It recognizes L(G). The FSM is nondeterminis3c.
Recognizing Regular Languages
Recognizing Regular Languages Proof (cont.) Given FSM M, there is a regular grammar G genera4ng language recognized by M. Let G have one non-terminal q i for each state of M and one rule of the form q i aq j if there is an edge labeled a from q i to q j. Add the rule q i ε if q i is a final state. Ini3al state q 0 is associated with the start symbol S of G. The set of strings {w} that takes M from the ini3al state to a final state is the same set of strings generated by G such that S wb where B ε.
Parse Trees for CFLs Example: G 3 = (N 3,T 3,R 3,S) A deriva3on of caacaabcbc and its parse tree. s cmnc camanc ca 2 Ma 2 Nc ca 2 ca 2 Nc ca 2 ca 2 bnbc ca 2 ca 2 bcbc
Parse Trees Yield of tree is the string of characters at the leaves. The height of a parse tree is length of its longest path. In a lehmost deriva3on, rules invoked in depth-first leh to right order. Rightmost deriva3on similar.
Context-Free Languages (CFLs) Recall: A context-free grammar (CFG) is a phrase-structure grammar G = (N,T,R,S) in which each rule has only a single non-terminal on the leh. CFLs are generated by context-free grammars. Example: Let N 2 = {S}, T 2 = {ε, a, b}, R 2 = {S asb, S ε}. Then, G 2 = (N 2,T 2,R 2,S) is context-free.
Chomsky Normal Form A CFG G = (N, T, R, S) is in Chomsky normal form if every rule is of the form A BC or A b, b T, except if ε L(G) in which case S ε is also a rule. Theorem: Every CFL L can be generated by a CFG in Chomsky normal form.
Chomsky Normal Form Example Example: G 3 = (N 3,T 3,R 3,S). A Chomsky normal form grammar genera3ng this language uses (c) & (e) and replaces others by: (a) S CD, C c, D ME, E NC, (b) M AF, A a, F MA (d) N BG, B b, G NB
Conver3ng to Chomsky Normal Form Theorem: Every CFL L can be generated by a CFG in Chomsky normal form. Proof: If ε L, add S ε. Let L be generated by G. Convert G to G in Chomsky normal form in stages. a) Eliminate from G ε-rules of the form B ε (except for S ε) as follows: for each rule with at least one 1 B in right-hand side, e.g. A αbβbγ (α,β,γ are strings), add all possible rules formed by replacing B by ε in all possible ways e.g. A αβbγ, A αbβγ, A αβγ, giving four rules for one original rule.
Conver3ng to Chomsky Normal Form Proof (cont.) b) For rules A αw i β (α,β are strings) with w i T, replace it by A αz i β & add rule Z i w i, where Z i is a new non-terminal. Con3nue un3l all rules have a single terminal on right or a string of non-terminals. This new grammar also generates L.
Conver3ng to Chomsky Normal Form Proof (cont.) Rules are now of the form: a) A b for b T, b) S ε, c) A Z 1 Z 2... Z k, for Z i N. Consider rules of type c) with k = 1. Cascading such rules gives deriva3ons A B; delete all rules of type c) with k = 1 and replace them with A B if A B. The same language is generated.
Conver3ng to Chomsky Normal Form Proof (cont.) If C D and D b, add C b, dele3ng all rules of the form A B. This generates same language; all remaining rules are of the form S ε, A b or A Z 1 Z 2... Z k with k 2, Z i N. Now replace all rules of the form A Z 1 Z 2... Z k by the rules A Z 1 N 1, N 1 Z 2 N 2,..., N k-3 Z k-2 N k-2, N k-2 Z k-1 Z k where each N i is a new nonterminal. This new grammar is in correct form and generates L. Q.E.D.
Example Let G = (N,T,R,E) be grammar with N = {E,T,F}, T = {a,b,+,*,(,)} and let R have following rules: E, T, F denote expressions, terms & factors. Easy to * * show that E (a*b+a)*(a+b) and E a*b+a. This grammar doesn t have ε rules. Use *, (, ), +, as non-terminals for *, (, ), and +.
Example Transform as indicated un3l only non-terminals on right. Then, reduce the number of non-terminals on the right to two.
Example The grammar is now in Chomsky normal form.
Summary Defini3ons of phrase structure, context-free and regular languages Proof showing that languages defined by regular grammars and languages recognized by finite-state machines are the same. Parse trees for CFLs Conver3ng CFGs to Chomsky normal form