Finite-State Transducers in Language and Speech Processing

Similar documents
Language properties and Grammar of Parallel and Series Parallel Languages

A General Class of Noncontext Free Grammars Generating Context Free Languages

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Discriminative Learning of Beam-Search Heuristics for Planning

A Version Space Approach to Learning Context-free Grammars

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

A R "! I,,, !~ii ii! A ow ' r.-ii ' i ' JA' V5, 9. MiN, ;

Refining the Design of a Contracting Finite-State Dependency Parser

CS 598 Natural Language Processing

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Self Study Report Computer Science

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

The Strong Minimalist Thesis and Bounded Optimality

Lecture 1: Machine Learning Basics

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Abstractions and the Brain

CS Machine Learning

Erkki Mäkinen State change languages as homomorphic images of Szilard languages

Developing a TT-MCTAG for German with an RCG-based Parser

Probability and Game Theory Course Syllabus

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Proof Theory for Syntacticians

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Parsing of part-of-speech tagged Assamese Texts

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

On the Polynomial Degree of Minterm-Cyclic Functions

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

What the National Curriculum requires in reading at Y5 and Y6

ARNE - A tool for Namend Entity Recognition from Arabic Text

South Carolina English Language Arts

Lecture 10: Reinforcement Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

A relational approach to translation

Context Free Grammars. Many slides from Michael Collins

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Enumeration of Context-Free Languages and Related Structures

AQUA: An Ontology-Driven Question Answering System

Morphotactics as Tier-Based Strictly Local Dependencies

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Natural Language Processing. George Konidaris

Phonological Processing for Urdu Text to Speech System

Grammars & Parsing, Part 1:

Statewide Framework Document for:

PH.D. IN COMPUTER SCIENCE PROGRAM (POST M.S.)

GUIDE TO THE CUNY ASSESSMENT TESTS

Beyond the Pipeline: Discrete Optimization in NLP

A Comparison of Standard and Interval Association Rules

A Grammar for Battle Management Language

Language Acquisition Chart

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

IMPROVING PRONUNCIATION DICTIONARY COVERAGE OF NAMES BY MODELLING SPELLING VARIATION. Justin Fackrell and Wojciech Skut

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Linking Task: Identifying authors and book titles in verbose queries

GRAMMAR IN CONTEXT 2 PDF

Artificial Neural Networks written examination

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Detecting English-French Cognates Using Orthographic Edit Distance

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

MTH 215: Introduction to Linear Algebra

Infrared Paper Dryer Control Scheme

LING 329 : MORPHOLOGY

Applications of memory-based natural language processing

Learning to Schedule Straight-Line Code

Honors Mathematics. Introduction and Definition of Honors Mathematics

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Disambiguation of Thai Personal Name from Online News Articles

Modeling full form lexica for Arabic

Evolution of Collective Commitment during Teamwork

Constructing Parallel Corpus from Movie Subtitles

STA 225: Introductory Statistics (CT)

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Cross Language Information Retrieval

Vorlesung Advanced Topics in HCI (Mensch-Maschine-Interaktion 2)

Task Tolerance of MT Output in Integrated Text Processes

Rule Learning With Negation: Issues Regarding Effectiveness

Guidelines for Writing an Internship Report

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Hyperedge Replacement and Nonprojective Dependency Structures

The Smart/Empire TIPSTER IR System

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

The Role of String Similarity Metrics in Ontology Alignment

Liquid Narrative Group Technical Report Number

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

"f TOPIC =T COMP COMP... OBJ

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

An Interactive Intelligent Language Tutor Over The Internet

Development of the First LRs for Macedonian: Current Projects

Language Evolution, Metasyntactically. First International Workshop on Bidirectional Transformations (BX 2012)

Probability and Statistics Curriculum Pacing Guide

Word Segmentation of Off-line Handwritten Documents

Transcription:

Finite-State Transducers in Language and Speech Processing 報告人 : 郭榮芳 05/20/2003 1. M. Mohri, On some applications of Finite-state automata theory to natural language processing, J. Nature Language Eng. 2 (1996). 2. M. Mohri, Finite-state transducers in language and speech processing, Comput. Linguistics 23 (2) (1997).

Outline Introduction Sequential string-to-string transducers Power series and subsequential string-toweight transducers Application to speech recognition

Introduction Finite-state machines have been used in many areas of computational linguistics. Their use can be justified by both linguistic and computational arguments.

Linguistically Finite automata are convenient since they allow one to describe easily most of the relevant local phenomena encountered in the empirical study of language. They often lead to a compact representation of lexical rules, or idioms and clich es, that appears as natural to linguists (Gross, 1989).

Linguistically(cont.) Graphic tools also allow one to visualize and modify automata.this helps in correcting and completing a grammar. Other more general phenomena such as parsing context-free grammars can also be dealt with using finite-state machines such as RTN s (Woods, 1970).

Computational The use of finite-state machines is mainly motivated by considerations of time and space efficiency. Time efficiency is usually achieved by using deterministic automata. Deterministic automata Have a deterministic input. For every state,at most one transition labeled with a given element of the alphabet. The output of deterministic machines depends, in general linearly.

Computational(cont.) Space efficiency is achieved with classical minimization algorithms (Aho,Hopcroft, and Ullman, 1974) for deterministic automata. Applications such as compiler construction have shown deterministic finite automata to be very efficient in practice (Aho, Sethi, and Ullman, 1986).

Applications in natural language processing Lexical analyzers The compilation of morphological Phonological rules Speech processing

The idea of deterministic automata Produce output strings or weights in addition to accepting(deterministically) input. Time efficiency Space efficiency A large increase in the size of data.

Limitations of the corresponding techniques, however, are very often pointed out more than their advantages. The reason for that is probably that recent work in this field are not yet described in computer science textbooks. Sequential finite-state transducers are now used in all areas of computational linguistics.

The case of string-to-string transducers. These transducers have been successfully used in the representation of large-scale dictionaries, computational morphology, and local grammars and syntax. We describe the theoretical bases for the use of these transducers.in particular, we recall classical theorems and give new ones characterizing these transducers.

The case of sequential string-toweight transducers These transducers appear as very interesting in speech processing. Language models, phone lattices and word lattices. Determinization Minimization Unambiguous Some applications in speech recognition.

Sequential string-to-string transducers Sequential string-to-string transducers are used in various areas of natural language processing. Both determinization (Mohri, 1994c) and minimization algorithms (Mohri,1994b) have been defined for the class of p-subsequential transducers which includes sequential string-to-string transducers. In this section the theoretical basis of the use of sequential transducers is described. Classical and new theorems help to indicate the usefulness of these devices as well as their characterization.

Sequential transducers Sequential transducers: Sequential transducers has a deterministic input,namely at any state there is at most one transition labeled with a given element of the input alphabet. Output labels might be strings, including the empty stringε.

Sequential transducers(cont.) Their use with a given input does not depend on the size of the transducer but only on that of the input. The total computational time is linear in the size of the input.

Example of a sequential transducer

Definition of Non-sequential transducer V1 is the set of states, I1 is the initial state, F1 is the set of final states, A and B, finite sets corresponding respectively to the input and output alphabets of the transducer, δ1, the state transition function which maps V1 A to, σ1, the output function which maps V1 A V1 to B*.

Definition of Subsequential transducer I2 the unique initial state, δ2, the state transition function which maps V2 A to V2, σ1, the output function which maps V1 A to B*, Φ2, the final function maps F to B*

Denote x ^ y is the longest common prefix of two strings x and y. is the string y obtained by dividing (xy) at left by x. Subsets made of pairs (q,w) of a state q of T1 and a string J1(a)={(q,w) δ1(q,a) defined and (q,w) q2 } J2(a)={(q,w,q ) δ1(q,a) defined and (q,w) q2 and q δ1(q,a) }

Transducer T1 Subsequential transducer T2 obtained from T1 by determinization.

Transducer T1 Subsequential transducer T2 obtained from T1 by determinization.

Definition of a sequential string-tostring transducer More formally, a sequential string-to-string transducer T is a 7-tuple (Q,I,F,Σ,Δ,δ,σ). Q is the set of states, i Q is the initial state, F Q, the set of final states, Σ andδ, finite sets corresponding respectively to the input and output alphabets of the transducer, Δ, the state transition function which maps Q Σ to Q, σ, the output function which maps Q Σ to. *

Subsequential and p -Subsequential transducers p :at most p final output strings at each final state. p -subsequential transducers seem to be sufficient for describing linguistic ambiguities.

Subsequential and p -Subsequential transducers (cont.) Figure 2 Example of a 2-subsequential transducer t 1 EX.input string w = aa gives two distinct outputs aaa and aab.

Composition If t1 is a transducer from input1 to output1 and t2 is a transducer from input2 to output2,then t1ot2 maps from input1 to output2. making the intersection of the outputs of t1 with the inputs of t2.

Theorem 1 Let f : be a sequential (resp. p - subsequential) and g : be a sequential (resp. q -subsequential) function, then is sequential (resp. pq -subsequential).

Proof f: a p subsequential transducer g: a q subsequential transducer denote the final output functions of which map respectively represents for instance the set of final output strings at a final state r. Define the pq -subsequential transducer

Proof(cont.) transition and output functions final output function

Theorem 2 Let be a sequential (resp. p - subsequential) and be a sequential (resp. q -subsequential) function, then g + f is 2-subsequential (resp. (p + q)-subsequential).

Theorem 3 Let f be a rational function mapping f is sequential iff there exists a positive integer K such that:

Theorem 4 Let f be a partial function mapping. f is rational iff there exist a left sequential function and a right sequential function such that

Transducer T with no equivalent sequential representation. Left to right sequential transducer L. Right to left sequential transducer R

Theorem 5 Let T be a transducer mapping. It is decidable whether T is sequential. Based on the definition of a metric on Denote by the longest common prefix of two strings u and v in. It is easy to verify that the following defines a metric on :

Theorem 6 Let f be a partial function mapping. f is subsequential iff: 1. f has bounded variation (according to the metric defined above). 2. for any rational subset Y of is rational.

Theorem 7 Let be a partial function mapping. f is p subsequential iff: 1. f has bounded variation (using the metric d on ). 2. for all i (1<= i<= p ) and any rational subset Y of is rational.

Theorem 8 Let f be a rational function mapping. f is p -subsequential iff it has bounded variation (using the semi-metric ).

Application to language processing The composition, union,and equivalence algorithms for subsequential transducers are also very useful in many applications.

Representation of very large dictionaries. The corresponding representation offers very fast look-up since then the recognition does not depend on the size of the dictionary but only on that of the input string considered. As an example, a French morphological dictionary of about 21.2 Mb can be compiled into a p -subsequential transducer of size 1.3 Mb, in a few minutes (Mohri, 1996b).

Compilation of morphological and phonological rules Similarly, context-depen-dent phonological and morphological rules can be represented by finite-state transducers (Kaplan and Kay, 1994). This increases considerably the time efficiency of the transducer. It can be further minimized to reduce its size.

Syntax Finite-state machines are also currently used to represent local syntactic constraints (Silberztein, 1993; Roche, 1993; Karlsson et al., 1995; Mohri, 1994d). Linguists can conveniently introduce local grammar transducers that can be used to disambiguate sentences.