FORMAL METHODS II: FORMAL LANGUAGES. September 20, 2013 Rolf Pfeifer Rudolf M. Füchslin

Similar documents
Language properties and Grammar of Parallel and Series Parallel Languages

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Context Free Grammars. Many slides from Michael Collins

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Proof Theory for Syntacticians

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

A General Class of Noncontext Free Grammars Generating Context Free Languages

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

A Version Space Approach to Learning Context-free Grammars

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

An Introduction to the Minimalist Program

Refining the Design of a Contracting Finite-State Dependency Parser

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Parsing of part-of-speech tagged Assamese Texts

Chapter 9 Banked gap-filling

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Lecture 2: Quantifiers and Approximation

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

The Strong Minimalist Thesis and Bounded Optimality

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Construction Grammar. University of Jena.

Developing a TT-MCTAG for German with an RCG-based Parser

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

A Grammar for Battle Management Language

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Natural Language Processing. George Konidaris

"f TOPIC =T COMP COMP... OBJ

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Math 098 Intermediate Algebra Spring 2018

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Chapter 4: Valence & Agreement CSLI Publications

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Evolution of Collective Commitment during Teamwork

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Some Principles of Automated Natural Language Information Extraction

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Improving Fairness in Memory Scheduling

Specifying Logic Programs in Controlled Natural Language

Prediction of Maximal Projection for Semantic Role Labeling

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Morphotactics as Tier-Based Strictly Local Dependencies

Hindi Aspectual Verb Complexes

LING 329 : MORPHOLOGY

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

ABSTRACT. A major goal of human genetics is the discovery and validation of genetic polymorphisms

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

LTAG-spinal and the Treebank

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

Compositional Semantics

Lecture 1: Machine Learning Basics

AQUA: An Ontology-Driven Question Answering System

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

The Role of the Head in the Interpretation of English Deverbal Compounds

California Department of Education English Language Development Standards for Grade 8

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Parsing natural language

Multimedia Application Effective Support of Education

The College Board Redesigned SAT Grade 12

On the Polynomial Degree of Minterm-Cyclic Functions

Character Stream Parsing of Mixed-lingual Text

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Hyperedge Replacement and Nonprojective Dependency Structures

Type Theory and Universal Grammar

AMULTIAGENT system [1] can be defined as a group of

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

ARNE - A tool for Namend Entity Recognition from Arabic Text

Constraining X-Bar: Theta Theory

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Enumeration of Context-Free Languages and Related Structures

While you are waiting... socrative.com, room number SIMLANG2016

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Update on Soar-based language processing

EVOLVING POLICIES TO SOLVE THE RUBIK S CUBE: EXPERIMENTS WITH IDEAL AND APPROXIMATE PERFORMANCE FUNCTIONS

Ch VI- SENTENCE PATTERNS.

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

The Inclusiveness Condition in Survive-minimalism

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

Today we examine the distribution of infinitival clauses, which can be

Copyright 2017 DataWORKS Educational Research. All rights reserved.

What the National Curriculum requires in reading at Y5 and Y6

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Chapter 2 Rule Learning in a Nutshell

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Evolution of Symbolisation in Chimpanzees and Neural Nets

Introduction to Simulation

Statewide Framework Document for:

Multiple case assignment and the English pseudo-passive *

Transcription:

FORMAL METHODS II: FORMAL LANGUAGES September 20, 2013 Rolf Pfeifer Rudolf M. Füchslin

Grammars and Languages Languages Natural Languages Natural language + High expressiveness + No extra learning - Ambiguity - Vagueness - Longish style - Consistency hard to check Formal Languages Formal language + Well defined syntax + Unambiguous semantics + Can be processed by computer + Large problems can be solved - High learning efford - Limited expressiveness - Low acceptance

Natural and Formal Languages Natural languages are evolved. Formal languages are constructed. Humans tend to design in a modular manner: The resulting structures are comprehensible. This comprehensibility supports rational planning, and extendibility. Evolution has no rational: Solution only need to be effective not necessarily comprehensible. Evolution can only perform optimizations which immediately yield a benefit, but not e.g. "platform strategy" which deliberately facilitates future extensions. The evolutionary approach yields efficient and yet robust solutions

Evolution of Natural Languages

Evolution of Programming Languages

SYNTAX

Natural Languages Have Structure Words can be categorized.

Natural Languages Have Structure There are higher order structures.

Natural Languages Have Structure Sentences are represented as tree-like structures.

Syntax and Syntax Trees Tree-like structures can be constructed by replacement rules. Syntax tree I Clause Punc Clause Subject Verb Object Subject Determ Noun Object Determ Noun Verb chews Determ the a Noun dog bone Punc. indicates a choice. Example: A Noun can be replaced either by dog or by bone.

Syntax and Syntax Trees I Clause Punc Clause Subject Verb Object Subject Determ Noun Object Determ Noun Verb chews Determ the a Noun dog bone Punc. The dog chews a bone. A dog chews the bone. A bone chews a dog.. 1. I 2. Clause Punc 3. Clause. 4. Subject Verb Object. 5. Determ Noun Verb Object. 6. the Noun Verb Object. 7. the bone Verb Object. 8. the bone Verb Determ Noun. 9. the bone Verb a Noun. 10. the bone Verb a dog. 11. the bone chews a dog.

Syntax Trees Informal Description We have a set of symbols, some red, some green. We have a start symbol I. Replacement rules give substitutions for red symbols either by other red symbols or green symbols. Green symbols cannot by replaced. One proceeds, until no red symbols are left. I Clause Punc Clause Subject Verb Object Subject Determ Noun Object Determ Noun Verb chews Determ the a Noun dog bone Punc. 1. I 2. Clause Punc 3. Clause. 4. Subject Verb Object. 5. Determ Noun Verb Object. 6. the Noun Verb Object. 7. the bone Verb Object. 8. the bone Verb Determ Noun. 9. the bone Verb a Noun. 10. the bone Verb a dog. 11. the bone chews a dog.

Syntax Trees Informal Description 1. I 2. Clause Punc 3. Clause. 4. Subject Verb Object. 5. Determ Noun Verb Object. 6. the Noun Verb Object. 7. the bone Verb Object. 8. the bone Verb Determ Noun. 9. the bone Verb a Noun. 10. the bone Verb a dog. 11. the bone chews a dog. 1. I 2. Clause Punc 3. Clause. 4. Subject Verb Object. 5. Subject Verb Determ Noun. 6. Subject Verb Determ dog. 7. Determ Noun Verb Determ dog. 8. the Noun Verb Determ dog. 9. the Noun Verb a dog. 10. the bone Verb a dog. 11. the bone chews a dog. Several sequences of applications of replacement rules lead to the same sentence / syntax tree.

Recursive Rules Subjects/Objects may consist many adjectives: The little young white dog... Possible rules to handle such constructs: Subject Determ ANoun Object Determ ANoun ANoun Noun AC Noun AC little white young little young little white young white little young white Noun dog bone The more adjectives, the more cumbersome rules!

Recursive Rules To keep rule tables small, recursive rules can be defined: Subject Determ Noun Object Determ Noun Noun Adjective Noun dog bone Adjective little white young

Recursive Rules To keep rule tables small, recursive rules can be defined: Subject Determ Noun Object Determ Noun Noun Adjective Noun dog bone Adjective little white young Problem: These rules allow constructs such as the white white little white white white dog.

Theory of Formal Languages The theory of formal languages investigates sets of structured sequences of characters (P. Rechenberg). Structure will be precisely defined. The structure in the theory of formal languages is deterministic no stochastic element.

Strings There are strings and strings: dkjfhd Asdf Nyuh lkjugty ^45 dfd @EcYTG ABABABABABABABABABABABAB ABAABAAABAAAABAAAAABAAAAAAB It s Friday morning. Str prst zkrz krk.

Strings There are strings and strings: dkjfhd Asdf Nyuh lkjugty ^45 dfd @EcYTG, probably a random string. ABABABABABABABABABABABABABABABAB a neatly ordered string with local structure. ABAABAAABAAAABAAAAABAAAAAAB a string with simple but non-local structure. It s Friday morning. a string with semantic meaning. Str prst zkrz krk a Czech proverb.

Structure and Meaning Using increasingly complex formal means, increasingly complex notions of Structure can be defined. Meaning is a more elusive concept. Open debate: Can Meaning be explained by structure?

How to Proceed In this lecture, focus is on grammars that generate formal languages. We first define what we understand by a formal language and then proceed to the definition of grammars. Automata that recognize the elements of a formal language are discussed later.

FORMAL LANGUAGES CONCEPTS AND DEFINITIONS

Languages Express meaning by sentences (words): "Don't smoke". Alternative: Use piktogram. Short messages: Piktograms probably more efficient. Long messages: Words composed of characters more efficient.

ALPHABETS, STRINGS AND LANGUAGES

Definition: Alphabet An alphabet is a finite set; its elements are called characters. Characters can be letters, but also symbols or even words. a, b, c, 0,1,, 1 2 3 4 a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z 5 ' is',' sunny ',' rainy',' the',' today ', ' tomorrow',' wheather ',' yesterday '

Definition: Strings A string is an ordered sequence of characters. Some usual abbreviations are: : the empty string 0 n n1 a a aa n, ( 0) Exponentiation of a character in V n a a, n 0 n a a, n 0 c c... c c c... c Reflection of a string R 1 2 n n n1 1 length of 0

Definition: Kleene-Star Given an alphabet. The Kleene-star of, *, is the set of all finite concatenations of elements of plus the empty string ε (which is not in ). * can be defined recursively: 1. Basis: ε * 2. Recursive step: If α * and c, then cα *. 3. Closure: β * if it can be produced by a finite application of the recursive step.

Definition: The + -Notation Given an alphabet. + is the set of all non-empty, finite strings produced with characaters from. + {ε} = *

Definition: Formal Language A formal language L over an alphabet is a subset of *: L *. Some trivial languages: L = : the empty language L = {ε}: the language consisting of the empty string. L = *: The Allsprache. Elements of a language are often called Sentences in theoretical computer science Words in mathematics

A Note on the Empty Language L = : the empty language L = {ε}: the language consisting of the empty string. The difference between these two languages can be illustrated with a metaphor: Having an empty bank account is not the same as having no bank account at all, though in both cases, one hasn t any money.

Definition: Operations on Languages Languages are sets. Consequently, they can be subject to set operations (L, M are both languages over V): The union of two languages: L M ( L) ( M ) The intersection of two languages: L M ( L) ( M ) The concatenation of two languages: LM ( L) ( M )

Examples of Formal Languages

How To Define Languages? The sets have to be described somehow: One can simply enumerate all sentences. Languages can be generated by grammars. A language can be defined by giving an automaton that recognizes its elements. The elements of a language can be given by a specification of properties: L = {α: α * P(α)}. P(α) is a proposition about α (The difference to the automaton is that specifying properties and specifying how they are checked is not the same thing).

Comment Languages can be generated by grammars. A language can be defined by giving an automaton that recognizes its elements. Native speakers, when checking the correctness of a sentence, usually just check whether they would it say the same way, means they try out, whether they can reconstruct a sentence (verification by reproduction). Only when one starts to learn a language, one analyzes a sentence and checks its compatibility with abstract rules (whether a memorized grammar automaton accepts it).

GRAMMARS

Definition: Grammar Definition: A grammar G is defined as a quadruple with G = (, V, P, S) : a finite set of terminal symbols (alphabet) V: a finite set of non-terminal symbols (variables) usually with the condition ( V) =. P: a finite set of production rules. S V: the start symbol.

Production Rules Production rules are basically rules for substituting substrings of a given string. The most general form of production rules is structured like this: has the form Further requirements on the structure of production rules define types of languages. Note: the guarantees that there is at least one non-terminal symbol on the left hand side of a production rule. Note: The Kleene- star contains by definition the empty string R,L may be empty. L, R V V V L R

Grammars: Comments A grammar is a finite set of production rules. A grammar G generates a language L(G). L can have infinitely many sequences. The rules of G have to be applied until no non-terminal symbol is present anymore. Restrictions on production rules define classes of grammars. A sequence of rule applications is called a derivation.

Grammar: Example

Definition: Grammar Tree Definition: A grammar tree is a tree where each link corresponds to a the application of one particular production rule, and where the leafs represent the elements of the language. The path from the root element to a leaf corresponds to the derivation of that elements. (Note: A grammar tree may be infinite).

Definition: Grammar Tree V : 0,1, : S, N Start symbol: S N N S N 0 1 NN S A syntax tree has characters as leaves, a grammar tree whole sentences.

Grammars and Automata We analyze specific languages as formal languages partly because there are automata recognizing their elements file globbing, regular expressions, parsing programs

TYPES OF LANGUAGES THE CHOMSKY HIERARCHY PART I

Types of Languages Languages can be categorized according to the structure of their production rules. The American philosopher and linguist Noam Chomsky introduced a categorification which turned out to be easy to use and represents fundamental differences between specific languages. Noam Chomsky

Regular Languages

Definition: Regular Grammars The production rules of a right-regular grammar have the form: A A B A, BV, Of course, there can be many rules of these types, depending on the size of V and.

Regular Grammars: Comments Informal description: Regular grammars produce strings by appending. From a physical point of view, they produce discrete time series, where the future is, up to well-defined choices, determined by the past. Once made, a choice cannot be taken back. A regular language is a language produced by a regular grammar.

Regular Languages: Examples S as A A ba

Regular Grammars: Examples Regular grammars seem to produce sequences based on local rules. Is there a regular grammar for binary strings with a number of 1 being a multiple of three?