Context-free Grammars Natural & Programming Languages Laureats Visit July 19, 2013 1/22
Example of a Programming Language: Go designed by Google (2012) documentation : specifies the syntax uses a context-free grammar 2/22
Example of a Programming Language: Go tool shipped with Go: YACC generates a parser from a grammar allows for creating, editing, adapting the syntax of programming languages 2/22
Pāṇini ( 350 BC) : Aṣṭādhyāyī Sanskrit grammar about 4000 rules formal rules: A B / C D rewrite A to B in the context C D auxiliary symbols 3/22
Chomsky (1956) : Three Models for the Description of Language 1. finite-state automata 2. phrase-structure grammars 3. transformational grammars N. Chomsky 4/22
Modeling a language set of sentences syntax vs. semantics: The child eats a tomato. A tomato eats the child. *A tomato the child eats. competence vs. performance: The child eats a nice tomato. The child eats a nice round tomato. The child eats a nice red round tomato.... 5/22
Constituents Analysis [[The child] [eats [a tomato]]]. [[The child] [eats [a [nice tomato]]]]. 6/22
Constituents Analysis (ctd.) P NP VP det AP v NP The n eats det AP child a adj AP nice n tomato 7/22
Context-free Grammars Special case of phrase-structured grammars: empty contexts P NP VP NP det AP VP v NP AP adj AP n det The a n child tomato v eats adj nice red round 8/22
Backus (1959); Naur (1960): Algol 60 ALGOrithmic Language standard syntax statement unconditional statement conditional statement unconditional statement for statement conditional statement if statement if statement else statement if statement if clause unconditional statement if clause if boolean expression then J. Backus 9/22
Ginsburg and Rice (1962) : Two families of languages related to ALGOL connection between Algol and Chomsky s work pluri-disciplinary research: linguistics programming languages Y. Bar-Hillel theoretical computer science (Chomsky, 1959; Bar-Hillel et al., 1961; Chomsky and Schützenberger, 1963,...) M.P. Schützenberger 10/22
Pushdown Automata Yngve (1960); Oettinger (1961); Chomsky (1962) operational model, easy implementation expressivity equivalent to that of context-free grammars idea of parsing: generate a pushdown automaton from a grammar 11/22
Pushdown Automata (ctd.) (q,ε,,ε,q f ) (q,ε,p,np VP,q) (q,ε,np,det AP,q)... (q,ε,det,the,q) (q,ε,det,a,q) (q,the,the,ε,q) (q,a,a,ε,q)... 12/22
Issues Floyd (1962b): Algol 60 is not context-free: begin real x; y := 3 end is only correct if the two identifiers x and y are the same. separation into lexical analysis, parsing, and semantics analysis R.W. Floyd 13/22
Issues Cantor (1962); Floyd (1962a): Algol 60 is ambiguous: several possible analyses for some programs inherently ambiguous languages (Parikh, 1966; Ginsburg and Ullian, 1966) undecidable properties R. Parikh 13/22
Issues the first parsers impose very stringent restrictions on grammars (Irons, 1961) ideally: deterministic pushdown automata (Ginsburg and Greibach, 1966) not derivable from any grammar undecidable properties S. Greibach 13/22
... and Answers parser generators for larger and larger classes of grammars Knuth (1965): LR parsing for all the deterministic languages DeRemer (1969) : simplifications (SLR & LALR) YACC (Johnson, 1975) : LALR(1) parser generator D.E. Knuth 14/22
Today All the mainstream programming languages are shipped with a context-free grammar that specifies their syntax a parser generator (most likely a YACC variant) allowing to write parsers for new languages 15/22
Syntax Models context-free grammars (rewriting systems) pushdown automata (transition systems) algebraic equations (equations systems) categorial grammars (proof systems) dynamic logic on trees (model theory) 16/22
Syntax Models context-free grammars (rewriting systems) pushdown automata (transition systems) algebraic equations (equations systems) categorial grammars (proof systems) dynamic logic on trees (model theory) 16/22
Algebraic Equations (Ginsburg and Rice, 1962; Chomsky and Schützenberger, 1963) Minimal solutions of a system P = NP VP NP = det AP VP = v NP AP = adj AP n det = {The} {a} n = {child} {tomato} v = {eats} adj = {nice} {round} {red} 17/22
Categorial Grammars (Bar-Hillel, 1953; Lambek, 1958) Categories built using left and right quotients over a finite set of symbols A: γ ::= A γ 1 \γ 2 γ 1 /γ 2 (categories) Deduction rules: w γ Lexicon w 1 γ 1 w 2 γ 1 \γ 2 \ w 1 w 2 γ 2 w 1 γ 2 /γ 1 w 2 γ 1 / w 1 w 2 γ 2 J. Lambek 18/22
Proofs Example The NP/n child n / The child NP eats (P\NP)/NP eats a tomato P\NP \ The child eats a tomato P a NP/n tomato n / a tomato NP / 19/22
Logics on Trees (Blackburn et al., 1993; Afanasiev et al., 2005) Modal logic on a set of atomic propositions p ϕ ::= p ϕ ϕ 1 ϕ 2 π ϕ π ::= π (formulæ) (relations) P. Blackburn 20/22
Models An ordered finite labeled tree t in a node n: t,n = t,n = p t,n = ϕ if the label of n is p if t,n = ϕ t,n = ϕ 1 ϕ 2 if t,n = ϕ 1 and t,n = ϕ 2 t,n = π ϕ if n,n π n and t,n = ϕ 21/22
Formulæ Example P [ ][ ]( X Σ N(X Y X Y) ( ) ( a) ( ) ( a Σ A NA) P (NP VP ) AP (adj AP ) (n ) det (The ) (a )...) 22/22
References References Afanasiev, L., Blackburn, P., Dimitriou, I., Gaiffe, B., Goris, E., Marx, M., and de Rijke, M., 2005. PDL for ordered trees. Journal of Applied Non-Classical Logic, 15(2):115 135. doi:10.3166/jancl.15.115-135. Aho, A.V., Johnson, S.C., and Ullman, J.D., 1975. Deterministic parsing of ambiguous grammars. Communications of the ACM, 18(8):441 452. doi:10.1145/360933.360969. Backus, J.W., 1959. The syntax and semantics of the proposed international algebraic language of the Zürich ACM-GAMM Conference. In IFIP Congress, pages 125 131. Bar-Hillel, Y., Perles, M., and Shamir, E., 1961. On formal properties of simple phrase-structure grammars. Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikations-forschung, 14:143 172. Bar-Hillel, Y., 1953. A quasi-arithmetical notation for syntactic description. Language, 29(1):47 58. doi:10.2307/410452. Blackburn, P., Gardent, C., and Meyer-Viol, W., 1993. Talking about trees. In EACL 93, pages 21 29. ACL Press. doi:10.3115/976744.976748. Cantor, D.G., 1962. On the ambiguity problem of Backus systems. Journal of the ACM, 9(4):477 479. doi:10.1145/321138.321145. Chomsky, N., 1956. Three models for the description of language. IEEE Transactions on Information Theory, 2(3): 113 124. doi:10.1109/tit.1956.1056813. Chomsky, N., 1959. On certain formal properties of grammars. Information and Control, 2(2):137 167. doi:10.1016/s0019-9958(59)90362-6. Chomsky, N., 1962. Context-free grammars and pushdown storage. Quarterly Progress Report 65, Research Laboratory of Electronics, M.I.T. Chomsky, N. and Schützenberger, M.P., 1963. The algebraic theory of context-free languages. In Braffort, P. and Hirshberg, D., editors, Computer Programming and Formal Systems, volume 35 of Studies in Logic, pages 118 161. North-Holland Publishing. doi:10.1016/s0049-237x(08)72023-8. DeRemer, F.L., 1969. Practical Translators for LR(k) Languages. PhD thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts. http://www.lcs.mit.edu/publications/pubs/pdf/mit-lcs-tr-065.pdf. Earley, J., 1975. Ambiguity and precedence in syntax description. Acta Informatica, 4(2):183 192. doi:10.1007/bf00288747. Floyd, R.W., 1962a. On ambiguity in phrase structure languages. Communications of the ACM, 5(10):526. doi:10.1145/368959.368993. Floyd, R.W., 1962b. On the nonexistence of a phrase structure grammar for ALGOL 60. Communications of the ACM, 5(9):483 484. doi:10.1145/368834.368898. Ginsburg, S. and Rice, H.G., 1962. Two families of languages related to ALGOL. Journal of the ACM, 9(3):350 371. doi:10.1145/321127.321132. Ginsburg, S. and Greibach, S., 1966. Deterministic context-free languages. Information and Control, 9(6):620 648. doi:10.1016/s0019-9958(66)80019-0. 23/22
References References Ginsburg, S. and Ullian, J., 1966. Ambiguity in context free languages. Journal of the ACM, 13(1):62 89. doi:10.1145/321312.321318. Irons, E.T., 1961. A syntax directed compiler for ALGOL 60. Communications of the ACM, 4(1):51 55. doi:10.1145/366062.366083. Johnson, S.C., 1975. YACC yet another compiler compiler. Computing science technical report 32, AT&T Bell Laboratories, Murray Hill, New Jersey. Knuth, D.E., 1965. On the translation of languages from left to right. Information and Control, 8(6):607 639. doi:10.1016/s0019-9958(65)90426-2. Lambek, J., 1958. The mathematics of sentence structure. American Mathematical Monthly, 65(3):154 170. doi:10.2307/2310058. Naur, P., editor, 1960. Report on the algorithmic language ALGOL 60. Communications of the ACM, 3(5):299 314. doi:10.1145/367236.367262. Oettinger, A.G., 1961. Automatic syntactic analysis and the pushdown store. In Structure of Language and its Mathematical Aspects, volume 12 of Proc. of Symposia in Applied Math., pages 104 129. AMS. Parikh, R.J., 1966. On context-free languages. Journal of the ACM, 13(4):570 581. doi:10.1145/321356.321364. Yngve, V.H., 1960. A model and an hypothesis for language structure. Proceedings of the American Philosophical Society, 104(5):444 466. 24/22