Computational Linguistics II: Parsing

Similar documents
Language properties and Grammar of Parallel and Series Parallel Languages

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

A General Class of Noncontext Free Grammars Generating Context Free Languages

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Enumeration of Context-Free Languages and Related Structures

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Refining the Design of a Contracting Finite-State Dependency Parser

Factoring - Grouping

On the Polynomial Degree of Minterm-Cyclic Functions

Morphotactics as Tier-Based Strictly Local Dependencies

A Version Space Approach to Learning Context-free Grammars

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Proof Theory for Syntacticians

Hyperedge Replacement and Nonprojective Dependency Structures

"f TOPIC =T COMP COMP... OBJ

PRODUCT PLATFORM DESIGN: A GRAPH GRAMMAR APPROACH

GRAMMAR IN CONTEXT 2 PDF

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

systems have been developed that are well-suited to phenomena in but is properly contained in the indexed languages. We give a

Discriminative Learning of Beam-Search Heuristics for Planning

Natural Language Processing. George Konidaris

arxiv: v1 [math.at] 10 Jan 2016

Probability and Game Theory Course Syllabus

Developing a TT-MCTAG for German with an RCG-based Parser

ARNE - A tool for Namend Entity Recognition from Arabic Text

STA 225: Introductory Statistics (CT)

Evolution of Collective Commitment during Teamwork

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Improving Fairness in Memory Scheduling

Self Study Report Computer Science

Context Free Grammars. Many slides from Michael Collins

Reinforcement Learning by Comparing Immediate Reward

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Grade 5 + DIGITAL. EL Strategies. DOK 1-4 RTI Tiers 1-3. Flexible Supplemental K-8 ELA & Math Online & Print

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Improving Action Selection in MDP s via Knowledge Transfer

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Ch VI- SENTENCE PATTERNS.

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Lecture 10: Reinforcement Learning

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

A Grammar for Battle Management Language

Parsing of part-of-speech tagged Assamese Texts

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Detecting English-French Cognates Using Orthographic Edit Distance

Probability and Statistics Curriculum Pacing Guide

Liquid Narrative Group Technical Report Number

Parsing natural language

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Lecture Notes on Mathematical Olympiad Courses

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

TCC Jim Bolen Math Competition Rules and Facts. Rules:

MTH 141 Calculus 1 Syllabus Spring 2017

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

6.863J Natural Language Processing Lecture 12: Featured attraction. Instructor: Robert C. Berwick

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Lecture 1: Machine Learning Basics

Language Model and Grammar Extraction Variation in Machine Translation

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Statewide Framework Document for:

An Interactive Intelligent Language Tutor Over The Internet

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Pre-Processing MRSes

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

1. Introduction. 2. The OMBI database editor

Probabilistic Latent Semantic Analysis

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

SYSTEM ENTITY STRUCTUURE ONTOLOGICAL DATA FUSION PROCESS INTEGRAGTED WITH C2 SYSTEMS

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Odd-Parity Parsing Problem 1 Brett Hyde Washington University May 2008

Be aware there will be a makeup date for missed class time on the Thanksgiving holiday. This will be discussed in class. Course Description

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Som and Optimality Theory

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Mathematics subject curriculum

Math 098 Intermediate Algebra Spring 2018

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Building Text Corpus for Unit Selection Synthesis

WSU Five-Year Program Review Self-Study Cover Page

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

(Sub)Gradient Descent

Carnap s Early Metatheory: Scope and Limits

The Effectiveness of Realistic Mathematics Education Approach on Ability of Students Mathematical Concept Understanding

A. True B. False INVENTORY OF PROCESSES IN COLLEGE COMPOSITION

Regret-based Reward Elicitation for Markov Decision Processes

Are You Ready? Simplify Fractions

Cal s Dinner Card Deals

Analysis of Probabilistic Parsing in NLP

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Mathematics Assessment Plan

Transcription:

Computational Linguistics II: Parsing Formal Languages: Regular Languages II Frank Richter & Jan-Philipp Söhn fr@sfs.uni-tuebingen.de, jp.soehn@uni-tuebingen.de Computational Linguistics II: Parsing p.1

Reminder: The Big Picture hierarchy grammar machine other type 3 reg. grammar DFA reg. expressions NFA det. cf. LR(k) grammar DPDA type 2 CFG PDA type 1 CSG LBA type 0 unrestricted Turing grammar machine DFA: Deterministic finite state automaton (D)PDA: (Deterministic) Pushdown automaton CFG: Context-free grammar CSG: Context-sensitive grammar LBA: Linear bounded automaton Computational Linguistics II: Parsing p.2

Form of Grammars of Type 0 3 For i {0, 1, 2, 3}, a grammar N,T,P,S of Type i, with N the set of non-terminal symbols, T the set of terminal symbols (N and T disjoint, Σ = N T ), P the set of productions, and S the start symbol (S N), obeys the following restrictions: T3: Every production in P is of the form A ab or A ǫ, with B,A N, a T. T2: Every production in P is of the form A x, with A N and x Σ. T1: Every production in P is of the form x 1 Ax 2 x 1 yx 2, with x 1,x 2 Σ, y Σ +, A N and the possible exception of C ǫ in case C does not occur on the righthand side of a rule in P. T0: No restrictions. Computational Linguistics II: Parsing p.3

Regular Languages Regular grammars, Computational Linguistics II: Parsing p.4

Regular Languages Regular grammars, deterministic finite state automata, Computational Linguistics II: Parsing p.4

Regular Languages Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and Computational Linguistics II: Parsing p.4

Regular Languages Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and regular expressions Computational Linguistics II: Parsing p.4

Regular Languages Regular grammars, deterministic finite state automata, nondeterministic finite state automata, and regular expressions characterize the same class of languages, viz. Type 3 languages. Computational Linguistics II: Parsing p.4

Reminder: DFA Definition 1 (DFA) A deterministic FSA (DFA) is a quintuple (Σ,Q,i,F,δ) where Σ is a finite set called the alphabet, Q is a finite set of states, i Q is the initial state, F Q the set of final states, and δ is the transition function from Q Σ to Q. Computational Linguistics II: Parsing p.5

Reminder: Acceptance Definition 3 (Acceptance) Given a DFA M = (Σ,Q,i,F,δ), the language L(M) accepted by M is L(M) = {x Σ ˆδ(i,x) F }. Computational Linguistics II: Parsing p.6

Nondeterministic Finite-state Automata Definition 4 (NFA) A nondeterministic finite-state automaton is a quintuple (Σ,Q,S,F,δ) where Σ is a finite set called the alphabet, Q is a finite set of states, S Q is the set of initial states, F Q the set of final states, and δ is the transition function from Q Σ to Pow(Q). Computational Linguistics II: Parsing p.7

Theorem (Rabin/Scott) For every language accepted by an NFA there is a DFA which accepts the same language. Computational Linguistics II: Parsing p.8

Regular Expressions Given an alphabet Σ of symbols the following are all and only the regular expressions over the alphabet Σ {Ø, 0,,, [, ]}: Ø empty set 0 the empty string (ǫ, []) σ for all σ Σ [α β] union (for α,β reg.ex.) (α β, α + β) [α β] concatenation (for α, β reg.ex.) [α*] Kleene star (for α reg.ex.) Computational Linguistics II: Parsing p.9

Meaning of Regular Expressions L(Ø) = L(0) = {0} L(σ) = {σ} L([α β]) = L(α) L(β) L([α β]) = L(α) L(β) L([α ]) = (L(α))* the empty language the empty-string language Σ is called the universal language. Note that the universal language is given relative to a particular alphabet. Computational Linguistics II: Parsing p.10

Theorem (Kleene) The set of languages which can be described by regular expressions is the set of regular languages. Computational Linguistics II: Parsing p.11

Pumping Lemma for Regular Languages uvw theorem: For each regular language L there is an integer n such that for each x L with x n there are u,v,w with x = uvw such that 1. v 1, 2. uv n, 3. for all i IN 0 : uv i w L. Computational Linguistics II: Parsing p.12

A Non-regular Language Corollary Let Σ be {a,b}. L = {a n b n n IN} is not regular. Proof Assume k IN. For each a k b k = uvw with v ǫ 1. v = a l, 0< l k, or 2. v = a l 1 b l 2, 0< l 1, l 2 k, or 3. v = b l, 0< l k, or In each case we have uv 2 w L. The result follows with the Pumping Lemma. Computational Linguistics II: Parsing p.13

Natural and Regular Languages Corollary German is not a regular language. Proof Consider L 1 ={Ein Spion (der einen Spion) k observiert l wird meist selbst observiert} L 1 is regular. L 1 Deutsch = {Ein Spion (der einen Spion) k observiert k wird meist selbst observiert} is not regular. Computational Linguistics II: Parsing p.14

Theorem (Myhill/Nerode) The following three statements are equivalent: 1. The set L Σ is accepted by some DFA. 2. L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index. 3. Let equivalence relation R L be defined by: xr L y iff for all z Σ, xz L iff yz L. Then R L is of finite index. Computational Linguistics II: Parsing p.15

Minimization For every nondeterministic finite-state automaton there exists an equivalent deterministic automaton with a minimal number of states. Computational Linguistics II: Parsing p.16

Closure Properties of Regular Languages Regular languages are closed under union intersection complement product Kleene star Computational Linguistics II: Parsing p.17

Closure Properties of Regular Languages Regular languages are closed under union (regular expression) intersection complement product (regular expression) Kleene star (regular expression) Computational Linguistics II: Parsing p.17

Closure Properties of Regular Languages Regular languages are closed under union (regular expression) intersection (e.g. constructive) complement product (regular expression) Kleene star (regular expression) Computational Linguistics II: Parsing p.17

Closure Properties of Regular Languages Regular languages are closed under union (regular expression) intersection (e.g. constructive) complement (DFA) product (regular expression) Kleene star (regular expression) Computational Linguistics II: Parsing p.17

Decidable Problems for Reg. Languages 1. Word problem Computational Linguistics II: Parsing p.18

Decidable Problems for Reg. Languages 1. Word problem 2. Emptiness Computational Linguistics II: Parsing p.18

Decidable Problems for Reg. Languages 1. Word problem 2. Emptiness 3. Finiteness Computational Linguistics II: Parsing p.18

Decidable Problems for Reg. Languages 1. Word problem 2. Emptiness 3. Finiteness 4. Intersection Computational Linguistics II: Parsing p.18

Decidable Problems for Reg. Languages 1. Word problem 2. Emptiness 3. Finiteness 4. Intersection 5. Equivalence Computational Linguistics II: Parsing p.18