Machines and languages theory. Lecture 1

Similar documents
Language properties and Grammar of Parallel and Series Parallel Languages

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

A General Class of Noncontext Free Grammars Generating Context Free Languages

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

A Version Space Approach to Learning Context-free Grammars

Proof Theory for Syntacticians

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Probability and Game Theory Course Syllabus

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Grammars & Parsing, Part 1:

Natural Language Processing. George Konidaris

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Context Free Grammars. Many slides from Michael Collins

Enumeration of Context-Free Languages and Related Structures

CS 598 Natural Language Processing

Lecture 10: Reinforcement Learning

On the Polynomial Degree of Minterm-Cyclic Functions

Self Study Report Computer Science

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

The Strong Minimalist Thesis and Bounded Optimality

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Standard 1: Number and Computation

Genevieve L. Hartman, Ph.D.

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

TOPICS LEARNING OUTCOMES ACTIVITES ASSESSMENT Numbers and the number system

"f TOPIC =T COMP COMP... OBJ

WSU Five-Year Program Review Self-Study Cover Page

Statewide Framework Document for:

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

What the National Curriculum requires in reading at Y5 and Y6

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Lecture 1: Machine Learning Basics

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Developing a concrete-pictorial-abstract model for negative number arithmetic

Grade 6: Correlated to AGS Basic Math Skills

Morphotactics as Tier-Based Strictly Local Dependencies

South Carolina English Language Arts

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Classroom Connections Examining the Intersection of the Standards for Mathematical Content and the Standards for Mathematical Practice

TabletClass Math Geometry Course Guidebook

Constraining X-Bar: Theta Theory

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Lecture 1: Basic Concepts of Machine Learning

Algebra 2- Semester 2 Review

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

University of Groningen. Systemen, planning, netwerken Bosman, Aart

AQUA: An Ontology-Driven Question Answering System

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

DMA CLUSTER CALCULATIONS POLICY

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Evolution of Collective Commitment during Teamwork

GUIDE TO THE CUNY ASSESSMENT TESTS

Office Hours: Mon & Fri 10:00-12:00. Course Description

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Physics Experimental Physics II: Electricity and Magnetism Prof. Eno Spring 2017

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Diagnostic Test. Middle School Mathematics

16 WEEKS STUDY PLAN FOR BS(IT)2 nd Semester

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Pre-Algebra A. Syllabus. Course Overview. Course Goals. General Skills. Credit Value

Are You Ready? Simplify Fractions

Compositional Semantics

Lecture 2: Quantifiers and Approximation

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

(Sub)Gradient Descent

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

This scope and sequence assumes 160 days for instruction, divided among 15 units.

Abstractions and the Brain

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Aspectual Classes of Verb Phrases

Intensive English Program Southwest College

MAT 122 Intermediate Algebra Syllabus Summer 2016

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

TU-E2090 Research Assignment in Operations Management and Services

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Grading Policy/Evaluation: The grades will be counted in the following way: Quizzes 30% Tests 40% Final Exam: 30%

STA 225: Introductory Statistics (CT)

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Ohio s Learning Standards-Clear Learning Targets

arxiv: v1 [math.at] 10 Jan 2016

Navigating the PhD Options in CMS

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Characteristics of Functions

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Lecture 1.1: What is a group?

Mathematics Success Level E

Course Syllabus Advanced-Intermediate Grammar ESOL 0352

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Math 098 Intermediate Algebra Spring 2018

Physics 270: Experimental Physics

Parsing of part-of-speech tagged Assamese Texts

Course Content Concepts

Transcription:

Machines and languages theory Lecture 1

Machines and languages theory Instructor: Fatemeh Daneshfar E-mail: f.daneshfar@uok.ac.ir TA:? Text: An Introduction to Formal Languages and Automata, (5th ed.) - Linz (Jones & Bartlett) Webpage: http://eng.uok.ac.ir/daneshfar

Machines and languages theory Requirements: 30% - Exam1 (the 21 th Aban) 30% - Exam2 (the 26 th Azar) 40% - Final Exam (two hours) cumulative. 5% - Presentation Students may not do extra credit work to make up for missing exams or poor exam grades.

Course overview The course discusses two very closely related concepts: models of computing and languages The computing models are simplified as much as possible, so as to boil them down to their most important elements. Some of the models will be more powerful than others. Languages are effectively sets of strings which are built by particular rules, called a grammar. The more complex the grammar, the more complicated the language can be. 4

Models of computing Finite state machines (automata) Pattern recognition Simple circuits (e.g. elevators, sliding doors) Automata with stack memory (pushdown automata.) Parsing computer languages Automata with limited tape memory Automata with infinite tape memory called `Turing machines Most powerful model possible Capable of solving anything that is solvable 5

Chomsky hierarchy of grammars Regular grammars Context free grammars Context sensitive grammars Unrestricted grammars We ll define what these mean later, but the important point is that the grammars become more complex as we go down the list, and contain those above it. 6

Computers recognise languages! Computers can be made to recognize, or accept, the strings of a language. There is a correspondence between the power of the computing model and the complexity of languages that it can recognize! Finite automata only accept regular grammars. Push down automata can also accept context free grammars. Turing machines can accept all grammars. This is why we study both in this course! 7

Week Section Syllabus Topic 1 1.1 Mathematical Preliminaries 2 1.2 Basic Concepts 2.1 Deterministic Finite Automata (dfa) 3 2.2 Nondeterministic Finite Automata (ndfa) 2.3 Equivalence of dfa's and ndfa's

Week Section Syllabus Topic 4 3.1 Regular Expressions (re) 5 3.2 Connection between re's and Regular Languages 3.3 Regular Grammars 6 4.1 Closure Properties of Regular Languages 4.2 Elementary Questions about Regular Languages 7 4.3 Identifying Nonregular Languages

Week Section Syllabus Topic 8 5.1 Context-free Grammars (cfg) 5.2 Parsing and Ambiguity 9 6.2 Chomsky and Greibach Normal Forms for cfg's 7.1 Nondeterministic Pushdown Automata (pda) 7.2 Pda's and Context-free Grammars 10 8.1 Pumping Lemma for Context-free Languages 8.2 Closure Properties of Context-free Languages

Week Section Syllabus Topic 11 9.1 Turing Machines 9.2 More on Turing Machines 12 9.3 Turing's Thesis 11.1 Recursive and Recursively Enumerable Languages 13 11.4 The Chomsky Hierarchy 12.1 Unsolvable Problems 14 12.2 Undecidable Problems for Rec. Enum. Languages 15 Final Exam

Chapter 1 Introduction to the Theory of Computing

Introduction to the Theory of Computing Mathematical Preliminaries and Notation Sets Functions and Relations Graphs and Trees Proof Techniques

Mathematical Preliminaries and Notation Sets notation {1, 2, 3, 4, 5}, {1, 2,..., 5}, {x 1 x 5} membership 3 S 7 S an apple S

Mathematical Preliminaries and Notation Sets union and intersection know what U (union) and (intersection) mean universal set understand that the universal set is merely ALL the things that are under discussion. implicit vs. explicit

Mathematical Preliminaries and Sets complement Notation The complement of S, written as S', (S with a bar over it in the book) is all the elements of the universe not in S difference The difference between S and T = S T = S T'

Mathematical Preliminaries and Notation empty (null) set The empty set ϕ is the set containing no elements

Mathematical Preliminaries and Notation demorgan's Laws (S' T')' = S U T (S' U T')' = S T

Mathematical Preliminaries and subset Notation S T iff every element in S is also in T S T (S is a proper subset of T) if S T and there is something in T that is not in S disjoint Two sets are disjoint if their intersection is empty; they have no elements in common

Mathematical Preliminaries and infinite vs. finite sets Notation A set is infinite if it is not possible to list all of the elements. There are two classes of infinity: numerable, not enumerable This will become very important in this class

Mathematical Preliminaries and powerset Notation The powerset of a set S (written as 2 S ) is the set of all the subsets of the set S There's a reason for the notation

Mathematical Preliminaries and Cartesian product Notation The Cartesian product of two sets S and T, written as S T, is the set of all the ordered pairs created by choosing one element of S and one element of T.

Mathematical Preliminaries and Functions Notation A function f : S T is the mapping of elements of S to unique elements of T domain, range

Mathematical Preliminaries and Relations Notation A relation between S and T is a set of ordered pairs (s, t) taken from these sets. A relation is a subset of Cartesian product of S and T. A function is a special kind of relation reflexive, symmetric, transitive A relation that is reflexive, symmetric and transitive is called an equivalence relation and partitions the underlying set.

Mathematical Preliminaries and Notation Graphs and Trees Read the section and learn the notation

Mathematical Preliminaries and Notation Proof Techniques proof by contradiction proofs Mathematical induction

Mathematical Preliminaries and Notation Mathematical induction Prove that the sum of all integers between 1 and n = n(n + 1) / 2.

Mathematical Preliminaries and Notation Prove that the sum of all integers between 1 and n = n(n + 1) / 2 for all n. Basis: Consider n = 1. The sum of all integers from 1 to 1 is 1. But (1)(2)/2 = 1 also. Proven for this case Hypothesis: Assume that the sum of all integers from 1 to n is n(n + 1) / 2

Mathematical Preliminaries and Notation Prove that the sum of all integers between 1 and n = n(n + 1) / 2 for all n. Induction: We will show that it is true for n + 1, where n is the number from the hypothesis. The sum of the integers from 1 to n + 1 is the same as the sum of the integers from 1 to n, plus n + 1. From the hypothesis, this is n(n+1)/2 + n + 1. A little algebra show us that this is (n + 1)(n + 2)/2. Done!

Introduction to the Theory of Three Basic Concepts Languages: Computing An alphabet is a finite set of symbols. = {a, b} A string or word is any series of symbols from the alphabet. w = abaaa : empty string *: the set of all strings on ( + = * { }) A language is any set of words (a subset L of *). Sentence: a string in L

Languages A language is a set of strings. If is an alphabet, then a language over is a collection of strings whose components come from. So * isthebiggest possible language over, and every other language over is a subset of *. 31

Examples of languages Four simple examples of languages over an alphabet are the sets,{ },, and *. For example, if ={a} then these four simple languages over are, { }, {a}, and {, a, aa, aaa, }. Recall { } is the empty string while is the empty set. * is an infinite set. 32

Example: English The alphabet is A = {a,b,c,d,e x,y,z} The English language is made of strings formed from A: e.g. fun, excitement. We could define the English Language as the set of strings over A which appear in the Oxford English dictionary (but it is clearly not a unique definition). 33

Other Examples = {a, b} * = {, a, b, aa, ab, ba, aaa,...} L 1 = {a, aa, aab} (finite language) L 2 = {a n b n n 0} = {, ab, aabb,...} 34

Concatenation The natural operation of concatenation of strings places two strings in juxtaposition. For example, if then the concatenation of the two strings aab and ba is the string aabba. Use the name "cat " to denote this operation. cat(aab, ba) = aabba. 35

Combining Languages Also we can combine two languages L and M by forming the set of all concatenations of strings in L with strings in M. 36

Products of languages This new language is called the product of L and M and is denoted by L M. A formal definition can be given as follows: L M = {cat(s, t) s L and t M}. L1L2 = {xy x L1, y L2} For example, if L = {ab, ac} and M = {a, bc, abc}, then the product L M is the language L M = {aba, abbc, ababc, aca, acbc, acabc}. 37

Properties of products The following simple properties hold for any language L: L { } = { } L = L. L =.L =. The product is not commutative. In other words, we can find two languages L and M such that L M M L. The product is associative. In other words, if L, M, and N are languages, then L (M N) = (L M) N 38

Powers of languages If L is a language, then the product L L is denoted by L 2. The language product L n for every n {0, 1, 2, } is as follows: L 0 = { } L n = L L n-1 if n > 0 39

Example For example, if L = { a, bb} then the first few powers of L are L 0 = { } L 1 = L = {a, bb} L 2 = L L = {aa, abb, bba, bbbb} L 3 = L L 2 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb} 40

Languages Example 2: L = {a n b n n 0} 41

Languages Example 2: L = {a n b n n 0} L 2 = {a n b n a m b m n 0, m 0} 42

Closure of a language If L is a language over (i.e. L *) then the closure of L is the language denoted by L* and is defined as follows: L* = L 0 L 1 L 2. The positive closure of L is the language denoted by L + and defined as follows: L + = L 1 L 2 L 3. 43

L* vs. L + It follows that L* =L + { }. But it s not necessarily true that L + = L* - { }. For example, if we let our alphabet be ={a} and our language be L ={, a}, then L + = L*. 44

Properties of Closure Let L and M be languages over the alphabet. Then: a) { }* = * = { } b) L* = L* L* = (L*)* c) L if and only if L + = L* d) (L* M*)* = (L* M*)* = (L M)* e) L (M L)* = (L M)* L 45

Grammars A grammar for a natural language tells us whether a particular sentence is well-formed or not. <sentence> <noun-phrase><predicate> <noun-phrase> <article><noun> <predicate> <verb> <article> a the <noun> boy dog <verb> runs walks 46

Three Basic Concepts Grammars A grammar is a finite set of rules (called productions) over an alphabet and a set of variables (non-terminals) to define the structure of the strings in a language. rule: where and are any string containing symbols from the alphabet and variables from the set of variables Start Symbol. One variable is set special. It's called the start symbol

Grammars Formal grammar: G = (V, T, S, P) V: finite set of variables T: finite set of terminal symbols S V: start variable P: finite set of productions 48

Productions A grammar rule is often called a production, and it can be read in any of severalwaysasfollows: "replace by ", produces," " rewrites to, " reduces to." 49

Grammars Productions: x y x (V T) + y (V T) * w = uxv derives z = uyv w z w 1 * w n (w 1 w 2... w n w 1 = w n ) w 1 + w n 50

Other shorthand: The following three symbols with their associated meanings are used quite often in discussing derivations: derives in one step, + derives in one or more steps, * derives in zero or more steps. 51

Where to begin Every grammar has a special grammar symbol called a start symbol, and there must be at least one production with left side consisting of only the start symbol. For example, if S is the start symbol for a grammar, then there must be at least one production of the form S. 52

Grammars Generated language: Derivation: G = (V, T, S, P) L(G) = {w T * S * w} S w 1 w 2... w n w L(G) Sentential forms: S, w 1,w 2,..., w n (containing variables) 53

Grammars Example 3: G = ({S}, {a, b}, S, P) P: S asb S S asb aasbb aabb aabb: sentence aasbb: sentential form 54

Grammars Example 3: G = ({S}, {a, b}, S, P) P: S asb S 55

Grammars Example 3: G = ({S}, {a, b}, S, P) P: S asb S L(G) = {a n b n n 0} L(G) = {a n b n+1 n 0}? 56

Grammars Example 4: 57

Grammars Example 5: G 2 = ({S}, {a, b}, S, P 2 ) P 2 : S SS S S asb S bsa 58

Grammars Example 5: G 2 = ({S}, {a, b}, S, P 2 ) P 2 : S SS S S asb S bsa L(G 2 ) = {w n a (w) = n b (w)} 59

Example Let A = {a, b, c}. Then a grammar for the language A* can be described by the following four productions: S S as S bs S cs. Or in shorthand: S as bs cs, "S can be replaced by either, or as, or bs, or cs." 60

Sample derivation. S as bs cs, S as S as aas. S as aas aacs aacbs.. S as aas aacs aacbs aacb = aacb A short hand way of showing a derivation exists: S * aacb derives in zero or more steps 61

A more complex grammar S AB A aa B bb. We can deduce that the grammar non-terminal symbols are S, A, and B, the start symbol is S, and the language alphabet includes, a, and b. 62

Another derivation Let's consider the string aab. The statement S + aab means that there exists a derivation of aab that takes one or more steps. For example, we have S AB aab aaab aab aabb aab. 63

Introduction to the Theory of Computing

Introduction to the Theory of Computing

Finite languages If the language is finite, then a grammar can consist of all productions of the form S w for each string w in the language. For example, the language {a, ba} can be described by the grammar S a ab. 66

Infinite languages If the language is infinite, then some production or sequence of productions must be used repeatedly to construct the derivations. Notice that there is no bound on the length of strings in an infinite language. Therefore there is no bound on the number of derivation steps used to derive the strings. If the grammar has n productions, then any derivation consisting of n + 1 steps must use some production twice 67

For example, the infinite language {a n b n 0}canbe described by the grammar, S b as. To derive the string a n b, use the production S as repeatedly --n times to be exact-- and then stop the derivation by using the production S b. The production S as allows us to say If S derives w, then it also derives aw," 68

Some simple grammars Language Grammar {a, ab, abb, abbb} S a ab abb abbb {, a, aa, aaa, } S as {b, bbb, bbbbb, b 2n+1 } S bbs b {b, abc, aabcc,, a n bc n } S asc b {ac, abc, abbc,, ab n c} S abc B bb 69

Automata An abstract model of digital computer: Input file Control unit Storage Output 70

Automata Input file: is divided into squares. Input is a string over a given alphabet. Each input square holds a symbol. The symbols are read from left to right, one at a time. The end of the input string can be detected. 71

Automata Storage: consists of an unlimited number of cells. Each cell can hold a symbol from an alphabet (which can be different from the input alphabet). The contents of the storage cells can be read and changed. 72

Automata Control unit: has a finite number of internal states. Can be in any one of the internal states. Can change state in some defined manner. 73

Automata Transition function: current state input symbol storage info next state Output may be produced Info in the storage may be changed Configuration: current state input symbol storage info Move: current configuration next configuration 74

Automata General types of automata: Accepter: yes/no output Transducer: string of symbols as output Deterministic: single move Non-deterministic: multiple moves 75

Homework Exercises: 4, 5, 6, 8, 9, 12, 15, 17 of Section 1.2 - Linz s book. Reading: Section 1.3 - Linz s book. 76